FUNGIpath: a tool to assess fungal metabolic pathways predicted by orthology
© Grossetête et al; licensee BioMed Central Ltd. 2010
Received: 25 June 2009
Accepted: 1 February 2010
Published: 1 February 2010
More and more completely sequenced fungal genomes are becoming available and many more sequencing projects are in progress. This deluge of data should improve our knowledge of the various primary and secondary metabolisms of Fungi, including their synthesis of useful compounds such as antibiotics or toxic molecules such as mycotoxins. Functional annotation of many fungal genomes is imperfect, especially of genes encoding enzymes, so we need dedicated tools to analyze their metabolic pathways in depth.
FUNGIpath is a new tool built using a two-stage approach. Groups of orthologous proteins predicted using complementary methods of detection were collected in a relational database. Each group was further mapped on to steps in the metabolic pathways published in the public databases KEGG and MetaCyc. As a result, FUNGIpath allows the primary and secondary metabolisms of the different fungal species represented in the database to be compared easily, making it possible to assess the level of specificity of various pathways at different taxonomic distances. It is freely accessible at http://www.fungipath.u-psud.fr.
As more and more fungal genomes are expected to be sequenced during the coming years, FUNGIpath should help progressively to reconstruct the ancestral primary and secondary metabolisms of the main branches of the fungal tree of life and to elucidate the evolution of these ancestral fungal metabolisms to various specific derived metabolisms.
Currently, the Fungi have more published nuclear genome sequences than any other eukaryotic taxonomic group . This relative abundance (28 genomes in May 2009) can be explained by their economic significance and their moderate genome size [Additional File 1]. Since several species are model organisms for fundamental, medical, or agronomical and industrial studies (e.g. Saccharomyces cerevisiae, Candida albicans, Yarrowia lipolytica), fungal genomes seem suitable for large-scale comparative studies, which will allow their evolution to be elucidated [2–4]. Several teams [5–7] have already performed extensive comparisons of a few fungal genomes to predict groups of orthologous proteins, using published methods such as Inparanoid , OrthoMCL  or TribeMcl .
Distribution of ID-EC per kingdom in public databases
Number of species displaying IDs annotated with EC number (ID-EC)
Median value of the set of ID-EC found per species
This remarkable situation arises largely because there is currently no tool for large-scale analyses of fungal metabolism, except for a preliminary attempt to identify enzymes in pathogenic fungi for a limited number of metabolic pathways . To cope with this major shortcoming, we designed a tool that allows us to mine genomic data by combining two complementary approaches: (i) defining reliable groups of orthologous proteins and (ii) mapping these groups on to the metabolic pathways that are described in KEGG  and MetaCyc .
Organizing relevant data for analyzing fungal metabolic pathways
Identifying enzyme activities requires relevant prediction of orthologs
As more and more genomic data become available, homology can be used to reconstruct the metabolic pathways of newly-sequenced organisms, taking the pathways of well-studied model organisms such as yeast as reference. Accordingly, one must identify the amino acid sequences encoding each step of each pathway in organisms that have not been studied experimentally [16–18]. However, there are two major drawbacks in this transfer of information. First, the accuracy of functional annotation of many fungal genomes is low because experimental data are lacking except in the case of yeast [19–21]. Secondly, it is difficult to predict reliable orthologs among all the putative homologs detected during exhaustive comparison of pairs of genomes. Numerous methods have been published but none appears completely infallible (for a recent review, see ). Thus, we decided to apply independent methods to the same dataset, collect as many potential orthologs as possible, and then compute their overlap. Exploring several methods raised the probability of finding consistent groups corresponding to this overlap. Accordingly, we used three different and complementary approaches based on similarity searches, and another based on the analysis of phylogenetic trees of families of homologs.
Searching pertinent orthologs
First, two published methods were used with their respective default parameters. Inparanoid  allows us to identify the orthologs and the inparalogs (genes duplicated since the last speciation event) during pairwise genome comparison. OrthoMCL  permits consistent strongly-related groups of orthologs (including inparalogs) to be identified.
Secondly, we improved the classical all-versus-all BLASTP  approach to identifying pairs of best reciprocal hits (BRH)  with a dedicated Perl script, enhancing the definition of orthologs by specifying two parameters, the alignment percentage and the score ratio, to filter the BLAST results. Local conservation was avoided by dividing the alignment length of each aligned sequence by its total length. The score ratio is defined as the ratio of the raw BLAST score computed by aligning a pair of sequences to the raw score of each sequence against itself (i.e. maximum score). Only results with score ratios over 0.2 and alignment percentages above 60% were kept for further studies.
These different methods based on sequence similarity yield various clusters of orthologous proteins that are more or less stringent depending whether single (e.g. Inparanoid) or multiple (e.g. BRH [Additional File 2]) links are used to build the orthologous protein groups.
Besides these methods based on similarity approaches, methods based on phylogenetic analysis have recently been developed to build orthologous groups [25, 26]. Here we chose a phylogenetic approach we had previously developed  to obtain groups of orthologous proteins, using automated analysis of trees of families of homologous proteins without a reference tree. The homologous proteins were first detected using BLASTP  with the following constraints: an E-value less than 0.001 and an alignment extending for at least 70% of the length of the shorter matching protein. For each family, a multiple alignment was built with Muscle , and the phylogenetic tree deduced was reconstructed using PhyML . The program Retree from the Phylip package  was further used to root the tree in order to distinguish orthologs from paralogs using automatic tree analysis .
Groups of orthologous proteins for the 20 genomes available in FUNGIpath predicted by four different methods
Relative percentage sharing between two methods
Identifying biologically relevant groups of orthologs
Although the overlap between these different methods for detecting orthologs appears narrow, we tried to build a consensus of the groups of orthologs using both union and intersection methods. Consideration of all the orthologs found merged large numbers of proteins (2,694 proteins in the largest group), with a trend towards amalgamating sometimes quite distant groups of orthologs. On the other hand, computing the crude intersection of the different methods also seemed inadequate (32 proteins in the largest group), since the BRH approach does not detect the inparalogs found by the other methods.
Sampling the orthologs in relevant groups
HMM profile and enrichment
Total number of groups
Size of the largest group
Comparing the orthologous groups predicted by FUNGIpath and by the four methods initially used
Percent of groups identical with FUNGIpath
Percent of groups specific in FUNGIpath
Assessing the reliability of the predicted final groups of orthologs
where m is the number of methods used for orthology prediction, I F, i is the number of orthologs shared (intersection) between the result of method i and the final group of orthologs, O F is the number of orthologs in the final group and G i is the number of groups obtained by method i for the set of proteins composing the final group. This confidence score is based on the assumption that the reliability of a final group increases with the number of independent methods that find it. Thus, if method i predicts the attested group, the score is 1. If not, the score is greater than 0 and less than or equal to 1. The average score (computed as the sum of scores for each method divided by the total number of methods m) was scaled from 0-10 by multiplying by 10; the higher the score, the better the agreement among the four methods. With this scoring approach, the user of FUNGIpath can evaluate the reliability of each predicted group of orthologs at any time.
Transferring EC number annotations to predicted groups of orthologs
Once the final groups of orthologs have been defined and attested, the functional annotations defined for well-studied proteins referenced in reliable public databases can be transferred to homologous unannotated amino acid sequences. For that purpose, an HMM profile was built for each final group of orthologs after multiple alignment of their sequences  and use of the HMMER programs (hmmbuild and then hmmcalibrate ). We then searched all the HMM profiles against the sequences annotated with a valid four-digit EC number available in Swiss-Prot release 56.7 using the HMMER hmmsearch program ). The Swiss-Prot functional annotation was transferred to all members of a group of orthologs displaying a best hit E-value ≤ 10-80. The E-value threshold was lowered to 10-20 if at least one sequence of the group of orthologs was already endowed with the same Swiss-Prot annotation.
This approach allows fungal annotation to be improved by using the enzymatic annotation of any protein, irrespective of the phylum in which it was first described. Accordingly, we could transfer 864 EC numbers to 1399 of the 12850 groups of orthologs; if the fungal Swiss-Prot annotations were directly transferred, the number of groups would be only 935. This allowed 160 EC numbers to be added that were not present in fungal genomes in Swiss-Prot .
Note that as many as 349 EC numbers (40% of the total of 864) are present in the 20 genomes.
Numbering pathways defined by KEGG and/or MetaCyc
Once the different putative orthologs had been annotated as described above, we used them to predicting the different metabolic pathways exhaustively in the completely sequenced fungi under study. To do that, we used two reliable public databases, KEGG  and MetaCyc , which differ in the way they define pathways.
KEGG  defines so-called reference pathways, agglomerating related elementary pathways, while MetaCyc  is a universal metabolic database that presents the elementary pathways encoded by various organisms (1,500) separately, including variants (similar biochemical functions using different biochemical routes or similar sets of reactions). KEGG  was used to extract useful information from the reaction file and to download all corresponding GIF maps. BIOPAX (BIOlogical PAthway eXchange) files defined in MetaCyc  were downloaded and we automatically generated map pictures by directed graph building. We thus collected 154 reference pathways in KEGG and 1386 elementary pathways in MetaCyc, which define the main anabolic and catabolic routes.
Challenging the FUNGIpath predictions
Comparing the S. cerevisiae enzymatic data published in four different databases with those predicted in FUNGIpath
Distribution of ID-EC (percentage of larger database content)
Specific to database 1
Specific to database 2
Comparing the S. cerevisiae enzymatic data predicted in FUNGIpath with public databases
Number of ID-EC in FUNGIpath
Number of differences at digit position
Total ID-EC in FUNGIpath
Comparison of enzymatic data between KEGG and FUNGIpath based on the 12 species they share
Number of ID-EC
Same ID with different EC
KEGG specific ID-EC
FUNGIpath specific ID-EC
A few examples of the proposed queries are given below.
For instance, querying the sequence UM03237.1 belonging to the Ustilago maydis genome defines a final group that is found whichever method is used (confidence score is maximal) and displays an alignment of quite good quality. Thus, the likelihood of this group of orthologous proteins seems quite reasonable if we combine the high-level quality of the score and the suitability of its alignment.
Searching a specific step in a pathway
Searching a specific EC number (Fig. 5a) allows the level of conservation of this enzyme activity in each taxonomic group to be assessed; also the full list of pathways to which this EC number is predicted to belong can be obtained directly (Fig. 5b). For instance, Fig. 5 shows that acylamide amidohydrolase (EC 18.104.22.168) is very well conserved in fungi and is involved in at least six different pathways in both the KEGG and MetaCyc databases (Fig. 5b). Since this activity is used in so many pathways of both primary and secondary metabolisms, it is not surprising to find this EC number in ten distinct groups of orthologous proteins ranging in size from 4 to 25 members (data not shown). The distribution of the different orthologs and inparalogs present in these groups can be further used to study the evolution of these different pathways using the approaches described in Fig. 4.
Searching a complete pathway
Fig. 6 shows that only two of the five steps in biotin biosynthesis are highly conserved. EC 22.214.171.124 is detected in all the species compared except Aspergillus oryzae and Magnaporthe grisea. EC 126.96.36.199 is absent from several species (Coprinus cinereus, Puccinia graminis, U. maydis, Batrachochytrium dendrobatidis and Phycomyces blakesleeanus). Thus, the KEGG reference pathway 'biotin metabolism' (Fig. 6a) appears to be incomplete in many fungi, since several of its specific enzyme activities (EC 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168 and 22.214.171.124) are not found. We may suppose that either these EC numbers exist in the fungi but are not currently detectable, or the fungi use other enzyme activities to catalyse these reactions (see below). Moreover, the further steps in biotinylation catalyzed by the ligases EC 126.96.36.199, 188.8.131.52, 184.108.40.206, and 220.127.116.11 are fully conserved in all the main taxonomic groups of fungi.
Fig. 6c further shows that most of the EC numbers (blue text) correspond to proteins that have no EC number assignment in Swiss-Prot but have been annotated in FUNGIpath by orthology prediction. Only two of the twenty genomes (S. cerevisiae and Schizosaccharomyces pombe) have an annotation in these two databases (bold text).
Fig. 7 shows that only six of the 13 EC numbers involved in the KEGG reference pathway 'terpenoid biosynthesis' appear to be conserved among the fungi analyzed. Of these six EC numbers, four (EC 18.104.22.168, 22.214.171.124, 126.96.36.199 and 188.8.131.52) are found in all the species present in FUNGIpath. Some EC numbers are missing from only one fungal group: this seems to be the case for EC 184.108.40.206, which is absent in the Taphrinomycotina group. Note, however, that this group is represented by only one species, namely S. pombe. Two EC numbers (220.127.116.11 in green and 18.104.22.168 in orange) seem to be specific to certain fungi.
Fungal metabolism is exceptionally rich and complex , generating a wide variety of secondary metabolic pathways as these organisms progressively evolved to invade new ecosystems. Except in a few model organisms, very few reactions have been studied experimentally. The present-day facility in obtaining complete genome sequences for organisms that have never been experimentally studied has revealed a wide gap between the knowledge gained by disclosing full repertoires of putative amino acid sequences and ignorance of their actual function.
To close this gap, one needs to transfer functional annotation to putative sequences by homology using inductive instead of hypothetico-deductive approaches (holism versus reductionism) . For metabolism, this allows entire pathways to be reconstructed . In order to facilitate the study of fungal metabolism and its evolution, we have created the tool FUNGIpath, which makes the predictions made on this homology basis publicly available. Thus, it was necessary to design new experimental approaches in order to obtain reliable and sound predictions.
Collecting reliable orthologs
The first requirement was to detect sound orthologs, knowing that there is no uniquely reliable way to do so . The most commonly-used approach is bidirectional best hits (BRH) of BLAST alignment, with imposition of strict criteria on discriminating E-value over a given alignment length, but various more sophisticated approaches have also been developed . Selecting the best method(s) is not easy. For instance, benchmarking tests suggested that Inparanoid performs best while BRH is good for closely-related species . More recently, BRH was found to give results comparable to the more sophisticated methods , but it is limited to finding only a single hit among the multiple possible links between paralogs.
We therefore preferred to use several different approaches simultaneously, three based on sequence similarity and one on phylogeny, to obtain robust results. Since the overlap between the outputs of these four methods is very narrow (a result underlining how conflicting these orthology methods are), we enriched the data found in the intersection of the different methods with a HMM approach. This allowed us to obtain fairly coherent sets of reliable orthologs forming well-defined groups that are of adequate size (the largest containing only 297 sequences) and biologically relevant.
Using reliable orthologs to improve functional annotation
The second requirement for exploiting these orthology data to predict metabolic pathways in fungal species that have never been studied experimentally was to assign a functional annotation to each group of orthologous proteins. To do that, a correspondence was established between a group and an EC number, defining an enzyme catalyzing a specific step in a known pathway included in the KEGG and MetaCyc databases. Figs. 5 and 6 show how group(s) of orthologs responsible(s) for a specific enzyme can be found and how this EC number is distributed in the different genomes. Inter alia, the multiple sequence alignment and the deduced phylogenetic tree can be obtained for each family of orthologs and inparalogs encoding this EC number in the fungi compared. We have provided evidence that FUNGIpath is a reliable tool for annotating enzyme function in an automatically predicted group of orthologous proteins. It gives data that are either comparable to those of the independently curated public databases or, in many cases, better (see Table 6). At any rate, most of the differences appear to be limited to the fourth digit, corresponding mainly to the nature of the substrates of the enzymes compared.
FUNGIpath is also useful for finding the set of orthologs that constitutes an entire pathway. This allows us to determine whether all the steps of the pathway have been predicted and, if so, in how many of the genomes compared that pathway is complete. Indeed, one of the main problems encountered in trying to reconstruct entire pathways from orthology data is the occurrence of missing data  such as pathway holes . The absence of an EC number (orphan metabolic activities ) may be due to a low percentage identity of the corresponding amino acid sequence or to its replacement with another protein. Alternatively, the simultaneous absence of several EC numbers that belong to a specific pathway would suggest that the entire pathway is absent from the species concerned. This is the case, for instance, in the later steps in the KEGG reference pathway 'terpenoid biosynthesis', where the last three EC numbers are missing (Fig. 7a). However, it is possible that this absence may simply be due to a major annotation problem or to the replacement of this pathway with an alternate, undetected, one.
Overall, FUNGIpath appears to be a useful and innovative tool for helping to resolve some artifactual pathway holes. For instance, it is unique in annotating a group of orthologs found in six species as EC 22.214.171.124 (Fig. 3), aristolochene synthase. No such amino acid sequences are predicted in Swiss-Prot, KEGG or MetaCyc, but the presence of aristolochene synthase has been demonstrated experimentally in two fungi not included in FUNGIpath [43, 44], supporting our prediction.
FUNGIpath appears to be a reliable tool for the analysis of fungal metabolism. It will be especially useful for annotating newly-sequenced genomes of poorly-studied organisms.
Moreover, it allows the respective metabolisms of various taxa to be compared easily. For instance, 101 EC numbers are found uniquely in ascomycetes (data not shown) and may help to delineate the metabolic specificities of the last common ancestor of this group.
As more and more genomes are expected to be decrypted in the near future, tools such as FUNGIpath will be very useful for the progressive reconstruction of primary and secondary metabolisms in the ancestors of the main branches of the present-day fungal tree and for elucidating the evolution of various specific derived metabolisms. FUNGIpath will be updated regularly (at least twice a year) with newly published fungal genomes.
Availability and requirements
The database is available at http://www.fungipath.u-psud.fr. This web site is optimized for Firefox 2.x and has been successfully tested for Safari 2.0.3 and Internet Explorer 7.0.
Lists of abbreviations
Best reciprocal hits
- EC number:
Enzyme Commission number
a unique protein identifier (ID) and EC number pair.
We are grateful to Philippe Silar for his help during the process of designing the web site and his helpful comments about this work. The computations were performed on the MIGALE platform (INRA, Jouy-en-Josas, France). Special thanks to the JGI, Broad Institute, NITE, Stanford University, Sanger Institute, Genoscope and Génolevures for the fungal genomes, Swiss-Prot for the protein annotations, and KEGG and MetaCyc for the metabolic pathways. SG is a PhD student supported by a 'Doctorant CNRS' fellowship. We thank the Université Paris-Sud (PPF Bioinformatique et Biomathématiques), and the Agence Nationale de la Recherche (ANR-05-MMSA-0009 MDMS_NV_10) for support. Finally, we thank two anonymous reviewers whose inputs have led to significant improvements in this work.
- Liolios K, Mavrommatis K, Tavernarakis N, Kyrpides NC: The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 2007, 36: D475-479. 10.1093/nar/gkm884.PubMed CentralPubMedView ArticleGoogle Scholar
- Lutzoni F, Kauff F, Cox CJ, McLaughlin D, Celio G, Dentinger B, Padamsee M, Hibbett DS, James TY, Baloch E, Grube M, Reeb V, Hofstetter V, Schoch C, Arnold AE, Miądlikowska J, Spatafora J, Johnson D, Hambleton S, Crockett M, Shoemaker R, Sung GH, Lücking R, Lumbsch T, O'Donnell K, Binder M, Diederich P, Ertz D, Gueidan C, Hansen K, Harris RC, Hosaka K, Lim YW, Matheny B, Nishida H, Pfister D, Rogers J, Rossman A, Schmitt I, Sipman H, Stone J, Sugiyama J, Yahr R, Vilgalys R: Assembling the fungal tree of life: progress, classification, and evolution of subcellular traits. American Journal of Botany. 2004, 91: 1446-1480. 10.3732/ajb.91.10.1446.PubMedView ArticleGoogle Scholar
- Hibbett DS, Binder M, Bischoff JF, Blackwell M, Cannon PF, Eriksson OE, Huhndorf S, James T, Kirk PM, Lücking R, Thorsten Lumbsch H, Lutzoni F, Matheny PB, McLaughlin DJ, Powell MJ, Redhead S, Schoch CL, Spatafora JW, Stalpers JA, Vilgalys R, Aime MC, Aptroot A, Bauer R, Begerow D, Benny GL, Castlebury LA, Crous PW, Dai YC, Gams W, Geiser DM, Griffith GW, Gueidan C, Hawksworth DL, Hestmark G, Hosaka K, Humber RA, Hyde KD, Ironside JE, Kõljalg U, Kurtzman CP, Larsson KH, Lichtwardt R, Longcore J, Miadlikowska J, Miller A, Moncalvo JM, Mozley-Standridge S, Oberwinkler F, Parmasto E, Reeb V, Rogers JD, Roux C, Ryvarden L, Sampaio JP, Schüssler A, Sugiyama J, Thorn RG, Tibell L, Untereiner WA, Walker C, Wang Z, Weir A, Weiss M, White MM, Winka K, Yao YJ, Zhang N: A higher-level phylogenetic classification of the Fungi. Mycol Res. 2007, 11: 509-547. 10.1016/j.mycres.2007.03.004.View ArticleGoogle Scholar
- Soanes DM, Alam I, Cornell M, Wong HM, Hedeler C, Paton NW, Rattray M, Hubbard SJ, Oliver SG, Talbot NJ: Comparative Genome Analysis of Filamentous Fungi Reveals Gene Family Expansions Associated with Fungal Pathogenesis. PLoS ONE. 2008, 3 (6): e2300-10.1371/journal.pone.0002300.PubMed CentralPubMedView ArticleGoogle Scholar
- Kriventseva EV, Rahman N, Espinosa O, Zdobnov EM: OrthoDB: the hierarchical catalog of eukaryotic orthologs. Nucleic Acids Res. 2008, 36: D271-275. 10.1093/nar/gkm845.PubMed CentralPubMedView ArticleGoogle Scholar
- Hedeler C, Wong HM, Cornell J, Alam I, Soanes DM, Rattray M, Hubbard SJ, Talbot NJ, Oliver SG, Paton NW: e-Fungi: a data resource for comparative analysis of fungal genomes. BMC Genomics. 2007, 8: 426-10.1186/1471-2164-8-426.PubMed CentralPubMedView ArticleGoogle Scholar
- Marthey S, Aguileta G, Rodolphe F, Gendrault A, Giraud T, Fournier E, Lopez-Villavicencio M, Gautier A, Lebrun MH, Chiapello H: FUNYBASE: a FUNgal phYlogenomic dataBASE. BMC Bioinformatics. 2008, 9: 456-10.1186/1471-2105-9-456.PubMed CentralPubMedView ArticleGoogle Scholar
- Remm M, Storm CE, Sonnhammer EL: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol. 2001, 314: 1041-1052. 10.1006/jmbi.2000.5197.PubMedView ArticleGoogle Scholar
- Li L, Stoeckert CJ, Roos DS: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003, 13: 2178-2189. 10.1101/gr.1224503.PubMed CentralPubMedView ArticleGoogle Scholar
- Enright AJ, Kunin V, Ouzounis CA: Protein families and TRIBES in genome sequence space. Nucleic Acids Res. 2003, 31 (15): 4632-8. 10.1093/nar/gkg495.PubMed CentralPubMedView ArticleGoogle Scholar
- The UniProt Consortium: The universal protein resource (UniProt). Nucleic Acids Res. 2008, 36: D190-195. 10.1093/nar/gkn141.PubMed CentralView ArticleGoogle Scholar
- Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 1999, 27: 29-34. 10.1093/nar/27.1.29.PubMed CentralPubMedView ArticleGoogle Scholar
- Caspi R, Foerster H, Fulcher CA, Kaipa P, Krummenacker M, Latendresse M, Paley S, Rhee SY, Shearer AG, Tissier C, Walk TC, Zhang P, Karp PD: The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res. 2008, 36: 623-31. 10.1093/nar/gkm900.View ArticleGoogle Scholar
- Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB). [http://www.chem.qmul.ac.uk/iubmb/enzyme/index.html]
- Giles PF, Soanes DM, Talbot NJ: A relational database for the discovery of genes encoding amino acid biosynthetic enzymes in pathogenic fungi. Comp Funct Genomics. 2003, 4 (1): 4-15. 10.1002/cfg.236.PubMed CentralPubMedView ArticleGoogle Scholar
- Boyer F, Viari A: Ab initio reconstruction of metabolic pathways. Bioinformatics. 2003, 19 (Suppl 2): ii26-34.PubMedView ArticleGoogle Scholar
- Overbeek R, Larsen N, Maltsev N, Pusch GD, Selkov E: WIT/WIT2: Metabolic reconstruction system. Bioinformatics, database and systems. 1999, Kluwer academic Publisher, Boston, USAGoogle Scholar
- Paley S, Karp PD: Evaluation of computational metabolic-pathway predictions for Helicobacter pylori. Bioinformatics. 2002, 18: 715-724. 10.1093/bioinformatics/18.5.715.PubMedView ArticleGoogle Scholar
- Brenner SE: Errors in genome annotation. Trends Genet. 1999, 15: 132-133. 10.1016/S0168-9525(99)01706-0.PubMedView ArticleGoogle Scholar
- Janssen P, Goldovsky L, Kunin V, Darzentas N, Ouzounis CA: Genome coverage, literally speaking. The challenge of annotating 200 genomes with 4 million publications. EMBO Rep. 2005, 6: 397-399. 10.1038/sj.embor.7400412.PubMed CentralPubMedView ArticleGoogle Scholar
- Ouzounis CA, Karp PD: The past, present and future of genome-wide re-annotation. Genome Biology. 2002, 3: comment2001.1-2001.6. 10.1186/gb-2002-3-2-comment2001.View ArticleGoogle Scholar
- Kuzniar A, van Ham RC, Pongor S, Leunissen JA: The quest for orthologs: finding the corresponding gene across genomes. Trends Genet. 2008, 24: 539-551. 10.1016/j.tig.2008.08.009.PubMedView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-10.PubMedView ArticleGoogle Scholar
- Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N: The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA. 1999, 96: 2896-2901. 10.1073/pnas.96.6.2896.PubMed CentralPubMedView ArticleGoogle Scholar
- Lemoine F, Lespinet O, Labedan B: Assessing the evolutionary rate of positional orthologous genes in prokaryotes using synteny data. BMC Evol Biol. 2007, 7: 237-10.1186/1471-2148-7-237.PubMed CentralPubMedView ArticleGoogle Scholar
- Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E: EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2009, 19: 327-35. 10.1101/gr.073585.107.PubMed CentralPubMedView ArticleGoogle Scholar
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32: 1792-7. 10.1093/nar/gkh340.PubMed CentralPubMedView ArticleGoogle Scholar
- Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 2003, 52: 696-704. 10.1080/10635150390235520.PubMedView ArticleGoogle Scholar
- Felsenstein J: PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. 2005, Department of Genome Sciences, University of Washington, SeattleGoogle Scholar
- Eddy SR: Profile Hidden Markov models. Bioinformatics. 1998, 14 (9): 755-63. 10.1093/bioinformatics/14.9.755.PubMedView ArticleGoogle Scholar
- Cherry JM, Adler C, Ball C, Chervitz SA, Dwight SS, Hester ET, Jia Y, Juvik G, Roe T, Schroeder M, Weng S, Botstein D: SGD: Saccharomyces Genome Database. Nucleic Acids Res. 1998, 26: 73-79. 10.1093/nar/26.1.73.PubMed CentralPubMedView ArticleGoogle Scholar
- Claudel-Renard Clotilde, Chevalet Claude, Faraut Thomas, Kahn Daniel: Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Acids Research. 2003, 22: 6633-6639. 10.1093/nar/gkg847.View ArticleGoogle Scholar
- Green ML, Karp PD: Genome Annotation Errors in Pathway Databases Due to Semantic Ambiguity in Partial EC Numbers. Nucleic Acids Research. 2005, 33: 4035-4039. 10.1093/nar/gki711.PubMed CentralPubMedView ArticleGoogle Scholar
- PostgreSQL database management system. [http://www.postgresql.org/]
- Vaidyanathan S, Harrigan GG, Goodacre R: Metabolome analyses: strategies for systems biology. Metabolites and Fungal Virulence. Edited by: Driggers EM, Brakhage AA. 2005, Springer, 367-381.Google Scholar
- Maynard Smith J: The Problems of Biology. 1986, Oxford University PressGoogle Scholar
- Oliver SG: From DNA sequence to biological function. Nature. 1996, 379: 597-600. 10.1038/379597a0.PubMedView ArticleGoogle Scholar
- Hulsen T, Huynen M, de Vlieg J, Groenen P: Benchmarking ortholog identification methods using functional genomics data. Genome biology. 2006, 7 (4): R31-10.1186/gb-2006-7-4-r31.PubMed CentralPubMedView ArticleGoogle Scholar
- Altenhoff A, Dessimoz C: Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods. PLoS Computational Biology. 2009, 5 (1): e1000262-10.1371/journal.pcbi.1000262.PubMed CentralPubMedView ArticleGoogle Scholar
- Osterman A, Overbeek R: Missing genes in metabolic pathways: a comparative genomics approach. Curr Opin Chem Biol. 2003, 7: 238-251. 10.1016/S1367-5931(03)00027-9.PubMedView ArticleGoogle Scholar
- Green ML, Karp PD: A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases. BMC Bioinformatics. 2004, 5: 76-10.1186/1471-2105-5-76.PubMed CentralPubMedView ArticleGoogle Scholar
- Chen L, Vitkup D: Predicting genes for orphan metabolic activities using phylogenetic profiles. Genome Biol. 2006, 7: R17-10.1186/gb-2006-7-2-r17.PubMed CentralPubMedView ArticleGoogle Scholar
- Proctor RH, Hohn TM: Aristolochene synthase. Isolation, characterization, and bacterial expression of a sesquiterpenoid biosynthetic gene (Ari1) from Penicillium roqueforti. J Biol Chem. 1993, 268: 4543-4548.PubMedGoogle Scholar
- Cane DE, Kang I: Aristolochene synthase: purification, molecular cloning, high-level expression in Escherichia coli, and characterization of the Aspergillus terreus cyclase. Arch Biochem Biophys. 2000, 376: 354-64. 10.1006/abbi.2000.1734.PubMedView ArticleGoogle Scholar