- Research article
- Open Access
Combining multiple functional annotation tools increases coverage of metabolic annotation
- Marc Griesemer†1,
- Jeffrey A. Kimbrel†1,
- Carol E. Zhou2,
- Ali Navid1 and
- Patrik D’haeseleer1, 2Email authorView ORCID ID profile
© The Author(s). 2018
- Received: 20 August 2018
- Accepted: 5 November 2018
- Published: 19 December 2018
Genome-scale metabolic modeling is a cornerstone of systems biology analysis of microbial organisms and communities, yet these genome-scale modeling efforts are invariably based on incomplete functional annotations. Annotated genomes typically contain 30–50% of genes without functional annotation, severely limiting our knowledge of the “parts lists” that the organisms have at their disposal. These incomplete annotations may be sufficient to derive a model of a core set of well-studied metabolic pathways that support growth in pure culture. However, pathways important for growth on unusual metabolites exchanged in complex microbial communities are often less understood, resulting in missing functional annotations in newly sequenced genomes.
Here, we present results on a comprehensive reannotation of 27 bacterial reference genomes, focusing on enzymes with EC numbers annotated by KEGG, RAST, EFICAz, and the BRENDA enzyme database, and on membrane transport annotations by TransportDB, KEGG and RAST. Our analysis shows that annotation using multiple tools can result in a drastically larger metabolic network reconstruction, adding on average 40% more EC numbers, 3–8 times more substrate-specific transporters, and 37% more metabolic genes. These results are even more pronounced for bacterial species that are phylogenetically distant from well-studied model organisms such as E. coli.
Metabolic annotations are often incomplete and inconsistent. Combining multiple functional annotation tools can greatly improve genome coverage and metabolic network size, especially for non-model organisms and non-core pathways.
- Genome annotation
- Functional annotation
- Enzyme prediction
- Transport prediction
- Metabolic modeling
In the early days of genome sequencing, functional annotation involved computational prediction of gene function coupled with extensive manual curation by teams of experts [1–3]. Today, with the exponential explosion of DNA sequencing  the fraction of genes that have undergone any degree of manual curation or even experimental validation is becoming vanishingly small [5, 6]. Automated gene annotation tools employing different methodologies with minimal manual curation are widely used, functionally annotating by homology to existing annotations, or by identification of conserved domains/motifs within a coding sequence [7, 8]. Many draft genomes and metagenome bins are often run through a single annotation pipeline where genome annotations are inherited from previous genome annotations. Even when multiple annotation tools are used, integrating the different outputs in a cohesive manner remains a major challenge [9, 10]. However, individual annotation tools often return annotations for different subsets of genes, offering the potential to greatly increase the coverage of annotations by combining the outputs of multiple tools if the barrier for integration can be overcome. Accurate and comprehensive genome annotations are essential for deciphering evolutionary, systematic and ecosphere functions of sequenced organisms. In particular, constraint-based metabolic modeling has emerged as a key tool in systems biology . Genome-scale metabolic models implicitly assume complete and accurate functional annotation. However, 30–50% of genes in a typical genome still lack any functional annotation , a statistic which has not improved much over the past two decades of genome sequencing . More than 30% of these unannotated genes are estimated to have metabolic functions  leaving a significant gap in our understanding of the underlying metabolic processes. In addition, annotated genes include a large (and potentially growing) fraction of misannotations . Further, high-throughput untargeted metabolomics often contain a large fraction of peaks that cannot be reliably matched to any known metabolites , and many of those that can be identified often do not match any metabolic reconstruction of the microbial species involved , providing another strong indication of the extent of microbial metabolism we are missing with current metabolic annotation methods. As the scale and complexity of the biosystems that we study increases, there is a need to examine the metabolic interactions between constituent organisms and asses the system-level outcome of these exchanges. Metabolic modeling efforts, are moving beyond studying core metabolic pathways in a single organism towards multi-species models, real-world communities and ecosystems [17–19], and incorporation of complex ‘omics and metabolite data, emphasizing the need for a more complete coverage of the metabolic functions identified in microbial genomes, a much wider range of metabolites found in complex communities, and a greater emphasis on transport and exchange processes.
In constraint-based genome-scale modeling methods such as Flux Balance Analysis , the issue of missing metabolic annotations is dealt with by “gap filling” - the addition of a set of metabolic reactions beyond those that were derived directly from the genome annotation . A variety of gap filling algorithms have been developed to predict the missing reactions necessary to make the metabolic network model sufficiently complete to produce biomass [21–24]. In a broad collection of 130 genome-scale metabolic models added to the ModelSEED database , on average 56 additional gap filled reactions were needed for each model to produce biomass in simple defined nutrient media. Even after those additions an average of one-third of the reactions in each model were still blocked, meaning that there were still enough reactions missing in the network to preclude metabolic flux through those reactions [25, 26]. In addition, the number of reactions that can partake in a gap filling solution is vast (3270 in the case of E. coli), and the sets of reactions generated by different gap filling algorithms may have little or no overlap with each other . Clearly, a more complete identification and annotation of metabolic reactions would be preferable to the addition of dozens of poorly supported reactions just to patch the holes in the network.
Recent genome-scale modeling of Clostridium beijerinckii NCIMB 8052  demonstrated that the total number of genes and reactions included in the final curated model could be almost doubled by incorporating multiple annotation tools. The reconstruction of the C. beijerinckii metabolic network used 3 different database sources (SEED , KEGG , and RefSeq annotations captured in BioCyc ) to evaluate annotation coverage and produce a more robust model. Each annotation source contributed only about half of the reactions in the final curated model. Only a third of the reactions were present in all three sources, and these reactions were found to contribute significantly to a core set of active reactions in validation simulations. The small overlap between annotations was not simply due to any source contributing more heavily to a particular area of metabolism, nor did any one source outperform another in terms of model connectivity. Likewise, in an analysis of nine prokaryotic genomes, the three enzyme annotation sources used – NCBI, KEGG, and the PEDANT protein database  – only agreed on less than one third of the annotated genes .
Accurate metabolic models also rely on accurate determination of substrate transport between the bacterium and its environment. Transporter annotations have rarely been used in genome scale metabolic modeling because of the difficulty in computationally determining the exact substrate being transported. Because of this, many metabolic modeling methods simply assume that a transporter exists for the import of any necessary metabolite - an assumption that is incorrect in some cases. For example, the yogurt bacterium Streptococcus thermophilus has an unusual growth phenotype in that it grows poorly on glucose, even though it possesses all the required metabolic enzymes . Instead it preferably imports the disaccharide lactose, hydrolyzes it to glucose and galactose, then secretes the galactose back out of the cell. S. thermophilus lacks the typical glucose phosphotransferase system used by many bacteria, and instead has an efficient lactose import mechanism that makes it well adapted to grow in milk . Prediction tools such as TransportDB’s Transporter Automatic Annotation Pipeline (TransAAP, ) now allow researchers to generate substrate predictions that are sufficiently detailed to be included in metabolic pathways, and could give insights into growth or metabolite exchange phenotypes that are not readily apparent from the metabolic pathways present in the genome.
Reference genomes used in this study
Mycobacterium tuberculosis CDC1551
Mycobacterium tuberculosis H37Rv
Streptomyces coelicolor A3(2)
NC_003888, NC_003903, NC_003904
Bacteroides thetaiotaomicron VPI-5482
Candidatus Cardinium hertigiib
HG422566, CBQZ010000001- CBQZ010000011
Synechococcus elongatus PCC 7942
Listeria monocytogenes 10403S
Bacillus anthracis Ames
NC_003997, AE017335, AE017336
Bacillus subtilis 168
Clostridium saccharoperbutylacetonicum ATCC 27021
Eubacterium rectale ATCC 33656
Clostridioides difficile 630
Agrobacterium fabrum C58
AE008687, AE008688, AE008689, AE008690
Aurantimonas manganoxydans SI85-9A1
Caulobacter crescentus CB15
Caulobacter crescentus NA1000
Escherichia coli CFT073
Escherichia coli K-12 substr. W3110
Escherichia coli B str. REL606
Escherichia coli K-12 substr. MG1655a
Escherichia coli O157:H7 str. EDL933
Candidatus Evansia muellerib
Helicobacter pylori 26,695
Methylosinus trichosporium OB3b
Candidatus Portiera aleyrodidarum BT-QVLCb
Shigella flexneri 2a str. 2457 T
Vibrio cholerae O1 biovar El Tor str. N16961
Discrepancies in metabolic annotations between different tools
Percentage of gene-EC annotation agreements that exist between pairs of tools
The denominator is the number of genes across the 27 reference genomes that are covered by both tools. The numerator counts the number of such genes for which both tools provide at least one identical gene-EC number annotation.
One major caveat of using E. coli to evaluate the quality of annotation tools is that so much of our knowledge of microbial metabolism is based on E. coli, and therefore annotation tools can be expected to be trained or optimized on E. coli to some extent, so performance on E. coli is not necessarily indicative of results on other organisms. For example, the KEGG annotation provided by KAAS is done by calculating bidirectional best BLAST hits against annotated reference genomes including E. coli, essentially providing a direct lookup of E. coli annotations in the KEGG database.
Coverage of the metabolic network reconstruction
EFICAz produced the least number of unique EC numbers but had high agreement with RAST and KEGG (86% of EFICAz EC numbers were also generated by both RAST and KEGG – see Fig. 3), even though it uses very different annotation methods, which suggests it can be helpful in validating those annotation tools. Note that EFICAz also produces incomplete “three-digit” EC number annotations (e.g. 1.2.3.-) which may be useful for hole filling but were not considered in this analysis.
BLASTing against the BRENDA database of reference enzymes produced the smallest number of annotations, but a high fraction of unique EC numbers. Interestingly, of the top 10 unique EC numbers produced by this method, only one is also covered by RAST and KEGG, two of the EC numbers have been deprecated by the Enzyme Commission, and six are EC numbers that have been assigned in 2000 or later and may not have been incorporated into the predictions by the other annotation tools yet. So even though a simple BLAST against a reference database such as BRENDA proves to be one of the less effective means for assigning metabolic functions (compared to the more sophisticated HMM or protein family based methods used by the other tools), it may still have some value to capture recently described enzymes not already covered by the other tools.
While counting the number of EC numbers reflects the size of the metabolic network, counting the numbers of genes that have received any metabolic annotation reflects the genome annotation coverage. Additional file 1: Figure S1 shows the number of genes in all of the reference genomes annotated with one or more EC numbers by each of the tools. On average, 1361 genes per genome were assigned a function with at least one tool, and more than 65% of these genes were assigned a metabolic function by more than one tool. The results show that just as each tool adds a significant number of reactions to the metabolic network model, each tool also significantly contributes to the number of genes covered with metabolic annotations.
The EC numbers on which the different tools most often agree across the 27 reference genomes tended to belong to well-studied core metabolic pathways. Out of the 79 EC numbers on which all four tools agreed in at least half of the genomes (Additional file 2), more than three quarters (61/79) were involved in biosynthesis or biodegradation of amino acids, nucleotides, carbohydrates, and cofactors; or in the processing of RNA, DNA and proteins. In contrast, almost none of these EC numbers were involved in biosynthesis or degradation of fatty acids, lipids, aromatics compounds, or secondary metabolites.
Definitions of Precision, Recall and associated terms
EC numbers predicted by tools and found in EcoCyc.
EC numbers predicted by tools and not found in EcoCyc.
EC numbers in EcoCyc but not predicted by tools.
TP/(TP + FP)
Fraction of predicted EC numbers that are in EcoCyc.
TP/(TP + FN)
Fraction of EC numbers in EcoCyc correctly predicted by tools.
A more lenient annotation policy (e.g. merging annotations from all tools) will tend to generate fewer false negatives but more false positives, achieving a higher Recall at the expense of lower Precision. Conversely, a more restrictive annotation policy (e.g. only including EC numbers if all tools agree on them) can increase Precision, but at the expense of a lower Recall. Figure 4 shows a plot of precision versus recall for all the different combinations of tools. Out of the individual tools, KEGG performed best in terms of both precision and recall on this dataset (although as mentioned before, performance on E. coli K12 may not reflect performance on other genomes), and a simple Blast against the BRENDA database performed worst. Combinations that contain some union of the tools have a higher recall than each of the individual tools in the combination, but a somewhat lower precision (> 80%). In contrast, intersections of annotations from two or more tools show very high (> 90%) precision but much lower recall (< 65%). A consensus annotation that produces both higher recall and higher precision might be achieved by means of a weighted sum of all the annotation sources, similar to the approach taken by EnzymeDetector .
Examples of substrate annotation ranking
Metabolite that can be incorporated as a transport reaction in a metabolic model
Substrate(s) that map to a small number of possible transport reactions
• aromatic amino acid
Broader substrate classes not directly usable to construct a metabolic network
Very broad class of substrates
• multidrug efflux
No substrate annotated
Transporter annotations by RAST, KEGG and TransportDB showed surprisingly little overlap. Out of the more than 15,000 genes annotated as transporters (regardless of substrate prediction), the three tools only agree on 2.8% (423/15,161). Out of those, only 130 genes are annotated by all three tools with a specific substrate prediction (ranks 1–2). When two or more tools provide a sufficiently specific substrate annotation, the substrate annotations tend to agree 85% of the time, even if they may not be identical (for example, one transporter was annotated as “leucine/valine”, “leucine”, and “branched-chain amino acid” by TransportDB, RAST and KEGG respectively). Overall, the detailed transporter annotations by TransportDB’s Transporter Automatic Annotation Pipeline provide a significant advance over more general metabolic annotation tools such as RAST and KEGG.
This analysis has led us to make recommendations for providing a more comprehensive metabolic genome annotation. We found that a single annotation tool is often insufficient unless one is only interested in core metabolism where different tools often agree. Organisms that are phylogenetically far removed from well-studied model organisms are particularly susceptible, in which case annotation tools will tend to diverge far more. In addition, one can trade off confidence in predictions versus greater coverage by using the intersection or union of multiple annotation tools. BLASTing against a database of reference sequences is generally an inefficient method for annotating enzymes but may be useful to cover more recently assigned EC numbers not yet included by other tools. Still, all these efforts require manual curation to bring together annotation from multiple sources. More tool development is needed to merge annotations beyond simple EC numbers, and a universal reference database for well-balanced reactions and metabolites would be a very valuable resource to merge annotations that use different reaction nomenclatures [38–40]. Likewise, now that annotation tools such as TransportDB are producing significant numbers of transporter annotations with substrate predictions that are precise enough to be included in metabolic modeling, more tool development may be needed to fully take advantage of these substrate predictions in Flux Balance Analysis methods, and move beyond the current implicit assumption used by most algorithms that all metabolites can be transported when needed.
We focused on a total of 27 genomes of BioCyc Tier 1 & Tier 2 bacteria . These genomes are from a range of phyla, including 15 Proteobacteria, 6 Firmicutes, 3 Actinobacteria, 2 Bacteroidetes and 1 Cyanobacteria. In addition to a range of phyla, these 27 organisms also cover different lifestyles, including a human gut symbiont (Bacteroides thetaiotaomicron VPI-5482), pathogens (e.g. Mycobacterium tuberculosis), an obligate insect endosymbiont (Candidatus Evansia muelleri), an unusual manganese oxidizing bacterium (Aurantimonas manganoxydans SI85-9A1), and a photoautotrophic cyanobacterium (Synechococcus elongatus PCC 7942). Genbank files were downloaded from NCBI (accessions listed in Table 1), and standardized to remove all functional annotation, retaining only the original open reading frames and locus tag/protein identification information (Additional file 4).
RAST (Rapid Annotation Subsystem Technology, ) is an open-source web server for genome annotation, using an assignment propagation strategy based on manually curated subsystems and subsystem-based protein families that automatically guarantees a high degree of assignment consistency. RAST returns an analysis of the genes and subsystems in each genome. We used the NMPDR website  to generate genome-wide annotations for our 27 reference genomes and parsed any EC numbers from the functional annotation. This is also the core metabolic annotation tool used by the popular ModelSEED tool for generating draft genome-scale models of metabolism .
KEGG (Kyoto Encyclopedia of Genes and Genomes, ) is a collection of genome and pathway databases for systems biology. We used KAAS (KEGG Automatic Annotation Server [43, 44]) to generate genome-wide annotations for our 27 reference genomes. KAAS assigns KEGG Orthology (KO) numbers using the bi-directional best hit method (BBH) against a set of default prokaryotic genomes in the KEGG database. We mapped KO numbers to EC numbers using a mapping table provided by the KEGG BRITE Database .
EFICAz  uses large scale inference to classify enzymes into functional families, combining 4 methods into a single approach without the need for structural information. Recognition of functionally discriminating residues (FDR) allows EFICAz to use a method called evolutionary footprinting. EFICAz has been rigorously crossed validated, and achieves very high precision and recall, even for sequence similarities to known enzymes as low as 40%. We used a local install of EFICAz2.5 to generate EC number predictions for our 27 reference genomes.
BRENDA  is a database containing 7.2 million enzyme sequences categorized into 82,568 enzymatic functions based on the literature and contains functional and molecular information such as nomenclature, structure, and substrate specificity. We annotated our 27 reference genomes based on a BLAST search at > 60% sequence identity against a local copy of the 2011 BRENDA database of enzyme reference sequences.
All EC number annotations are available in Additional file 5.
Where available, pre-generated transporter annotations were downloaded from the TransportDB 2.0 database . For those reference genomes that were not already present in the database (M. trichosporium OB3b, Candidatus C. hertigii, Candidatus E. muelleri, and A. manganoxydans SI85-9A1), we submitted the predicted protein sequences to the TransAAP web-based transporter annotation tool using the default parameters . We also retrieved transporter annotations from RAST and KEGG. For RAST, annotations were filtered for the subsystem for “Membrane Transport”. For KEGG, we mapped KO numbers to transporter annotations using a mapping table provided by the KEGG BRITE Database . Substrate names were ranked from most to least specific (see Table 3). Substrates that can be incorporated as a transport reaction in a metabolic model were ranked 1 and 2. Broader substrate classes that could be used for gap filling or interpretation of transcriptomics data were ranked 3 and 4, and annotated transporters without substrate prediction were ranked 5. The full table of substrate names and ranking can be found in Additional file 6.
This work was supported by the Department of Energy through the Genomic Science Program as part of the LLNL Biofuels SFA (contract SCW1039–02). Work at LLNL was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (IM: LLNL-JRNL-750275).
Availability of data and materials
All data generated or analysed during this study are included in this published article and its supplementary information files. Accession numbers for the genomes used are included in Table 1.
PD conceived of the study. MG, JK, and PD designed the study, processed, and analyzed data. CEZ helped to produce the EFICAz and BRENDA data. AN provided supervision for the study MG, JK, and PD wrote the manuscript with input from CEZ and AN. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Kyrpides NC. Fifteen years of microbial genomics: meeting the challenges and fulfilling the dream. Nat Biotechnol. 2009;27:627–32 Nature Publishing Group.View ArticleGoogle Scholar
- Kyrpides NC, Ouzounis CA. Whole-genome sequence annotation:“Going wrong with confidence.”. Mol Microbiol. 1999;32:886–7.View ArticleGoogle Scholar
- Koonin EV, Mushegian AR, Rudd KE. Sequencing and analysis of bacterial genomes. Curr Biol. 1996;6:404–16.View ArticleGoogle Scholar
- Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, et al. Big Data: Astronomical or Genomical? PLoS Biol. 2015;13:e1002195 Public Library of Science.View ArticleGoogle Scholar
- Howe D, Costanzo M, Fey P, Gojobori T, Hannick L, Hide W, et al. The future of biocuration. Nature. 2008;455:47–50.View ArticleGoogle Scholar
- Baumgartner WA Jr, Cohen KB, Fox LM, Acquaah-Mensah G, Hunter L. Manual curation is not sufficient for annotation of genomic databases. 2nd ed. Bioinformatics. 2007;23:i41–8.View ArticleGoogle Scholar
- Friedberg I. Automated protein function prediction—the genomic challenge. Brief. Bioinform. 2006;7:225–42 Oxford University Press.View ArticleGoogle Scholar
- Médigue C, Moszer I. Annotation, comparison and databases for hundreds of bacterial genomes. Res Microbiol. 2007;158:724–36 Elsevier Masson.View ArticleGoogle Scholar
- Ijaq J, Chandrasekharan M, Poddar R, Bethi N, Sundararajan VS. Annotation and curation of uncharacterized proteins- challenges. Front Genet. 2015;6:1750.View ArticleGoogle Scholar
- Land M, Hauser L, Jun S-R, Nookaew I, Leuze MR, Ahn T-H, et al. Insights from 20 years of bacterial genome sequencing. Funct Integr Genomics. 2015;15:141–61 Springer.View ArticleGoogle Scholar
- Orth JD, Thiele I, Palsson BØ. What is flux balance analysis? Nat Biotechnol. 2010;28:245–8.View ArticleGoogle Scholar
- Hanson AD, Pribat A, Waller JC, de Crécy-Lagard V. ‘Unknown’ proteins and “orphan” enzymes: the missing half of the engineering parts list - and how to find it. Biochem J. 2010;425:1–11 Portland Press Limited.View ArticleGoogle Scholar
- Ellens KW, Christian N, Singh C, Satagopam VP, May P, Linster CL. Confronting the catalytic dark matter encoded by sequenced genomes. Nucleic Acids Res. 2017;45:11495–514 Oxford University Press.View ArticleGoogle Scholar
- Schnoes AM, Brown SD, Dodevski I, Babbitt PC. Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies. Valencia A, editor. PLoS Comp Biol. 2009;5:e1000605 Public Library of Science.View ArticleGoogle Scholar
- da Silva RR, Dorrestein PC, Quinn RA. Illuminating the dark matter in metabolomics. Proc Natl Acad Sci USA. 2015;112:12549–50 National Academy of Sciences.View ArticleGoogle Scholar
- Bowen BP, Fischer CR, Baran R, Banfield JF, Northen T. Improved genome annotation through untargeted detection of pathway-specific metabolites. BMC Genomics. 2011;12:S6 BioMed Central.View ArticleGoogle Scholar
- Baran R, Brodie EL, Mayberry-Lewis J, Hummel E, da Rocha UN, Chakraborty R, et al. Exometabolite niche partitioning among sympatric soil bacteria. Nat Comms. 2015;6:8289.View ArticleGoogle Scholar
- Henry CS, Bernstein HC, Weisenhorn P, Taylor RC, Lee J-Y, Zucker J, et al. Microbial community metabolic modeling: a community data-driven network reconstruction. J Cell Physiol. 2016;231:2339–45.View ArticleGoogle Scholar
- Magnúsdóttir S, Heinken A, Kutt L, Ravcheev DA, Bauer E, Noronha A, et al. Generation of genome-scale metabolic reconstructions for 773 members of the human gut microbiota. Nat Biotechnol. 2017;35:81–9 Nature Publishing Group.View ArticleGoogle Scholar
- Thiele I, Palsson BO. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nature protocols. 2010;5:93–121 Nature Publishing Group.View ArticleGoogle Scholar
- Reed JL, Patel TR, Chen KH, Joyce AR, Applebee MK, Herring CD, et al. Systems approach to refining genome annotation. Proc Natl Acad Sci USA. 2006;103:17480–4 National Academy of Sciences.View ArticleGoogle Scholar
- Kumar VS, Dasika MS, Maranas CD. Optimization based automated curation of metabolic reconstructions. BMC Bioinform. 2007;8:212 BioMed Central.View ArticleGoogle Scholar
- Kumar VS, Maranas CD. GrowMatch: An Automated Method for Reconciling In Silico/In Vivo Growth Predictions. Ouzounis CA, editor. PLoS Comp Biol. 2009;5:e1000308 Public Library of Science.View ArticleGoogle Scholar
- Benedict MN, Mundy MB, Henry CS, Chia N, Price ND. Likelihood-Based Gene Annotations for Gap Filling and Quality Assessment in Genome-Scale Metabolic Models. Maranas CD, editor. PLoS Comp Biol. 2014;10:e1003882 Public Library of Science.View ArticleGoogle Scholar
- Henry CS, DeJongh M, Best AA, Frybarger PM, Linsay B, Stevens RL. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol. 2010;28:977–82 Nature Publishing Group.View ArticleGoogle Scholar
- Ponce-de-León M, Calle-Espinosa J, Peretó J, Montero F. Consistency Analysis of Genome-Scale Models of Bacterial Metabolism: A Metamodel Approach. Vera J, editor. PLoS ONE. 2015;10:e0143626 Public Library of Science.View ArticleGoogle Scholar
- Krumholz EW, IGL L. Sequence-based Network Completion Reveals the Integrality of Missing Reactions in Metabolic Networks. J Biol Chem. 2015;290:19197–207 American Society for Biochemistry and Molecular Biology.View ArticleGoogle Scholar
- Milne CB, Eddy JA, Raju R, Ardekani S, Kim P-J, Senger RS, et al. Metabolic network reconstruction and genome-scale model of butanol-producing strain Clostridium beijerinckii NCIMB 8052. BMC Syst Biol 2014 8:2. 2011;5:130 BioMed Central.Google Scholar
- Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the rapid annotation of microbial genomes using subsystems technology (RAST). Nucleic Acids Res. 2014;42:D206–14 Oxford University Press.View ArticleGoogle Scholar
- Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44:D457–62 Oxford University Press.View ArticleGoogle Scholar
- Caspi R, Billington R, Ferrer L, Foerster H, Fulcher CA, Keseler IM, et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 2016;44:D471–80.View ArticleGoogle Scholar
- Walter MC, Rattei T, Arnold R, Güldener U, Münsterkötter M, Nenova K, et al. PEDANT covers all complete RefSeq genomes. Nucleic Acids Res. 2009;37:D408–11 Oxford University Press.View ArticleGoogle Scholar
- Quester S, Schomburg D. EnzymeDetector: an integrated enzyme function prediction tool and database. BMC Bioinform. 2011;12:376.View ArticleGoogle Scholar
- Poolman B. Energy transduction in lactic acid bacteria. FEMS Microbiol Rev. 1993;12:125–47.View ArticleGoogle Scholar
- Elbourne LDH, Tetu SG, Hassan KA, Paulsen IT. TransportDB 2.0: a database for exploring membrane transporters in sequenced genomes from all domains of life. Nucleic Acids Res. 2017;45:D320–4.View ArticleGoogle Scholar
- McDonald AG, Tipton KF. Fifty-five years of enzyme classification: advances and difficulties. 2nd ed. FEBS J. 2014;281:583–92 Wiley/Blackwell (10.1111).View ArticleGoogle Scholar
- Keseler IM, Mackie A, Santos-Zavaleta A, Billington R, Bonavides-Martínez C, Caspi R, et al. The EcoCyc database: reflecting new knowledge about Escherichia coli K-12. Nucleic Acids Res. 2017;45:D543–50.View ArticleGoogle Scholar
- Bernard T, Bridge A, Morgat A, Moretti S, Xenarios I, Pagni M. Reconciliation of metabolites and biochemical reactions for metabolic networks. Brief Bioinform. 2014;15:123–35.View ArticleGoogle Scholar
- Kumar A, Suthers PF, Maranas CD. MetRxn: a knowledgebase of metabolites and reactions spanning metabolic models and databases. BMC Bioinform. 2012;13:6 BioMed Central.View ArticleGoogle Scholar
- Lang M, Stelzer M, Schomburg D. BKM-react, an integrated biochemical reaction database. BMC Biochem. 2011;12:42 BioMed Central.View ArticleGoogle Scholar
- RAST (Rapid Annotation using Subsystem Technology). The NMPDR, SEED-based, prokaryotic genome annotation service. http://rast.nmpdr.org. Accessed 29 Dec 2016.
- Devoid S, Overbeek R, DeJongh M, Vonstein V, Best AA, Henry C. Automated Genome Annotation and Metabolic Model Reconstruction in the SEED and Model SEED, Systems Metabolic Engineering. Totowa: Humana Press; 2013. p. 17–45.Google Scholar
- Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 2007;35:W182–5.View ArticleGoogle Scholar
- KEGG Automatic Annotation Server. http://www.genome.jp/tools/kaas. Accessed 24 Oct 2016.
- KEGG BRITE Database. http://www.genome.jp/kegg-bin/get_htext?ko01000.keg. Accessed 29 Nov 2017.
- Kumar N, Skolnick J. EFICAz2.5: application of a high-precision enzyme function predictor to 396 proteomes. Bioinformatics. 2012;28:2687–8.View ArticleGoogle Scholar
- Placzek S, Schomburg I, Chang A, Jeske L, Ulbrich M, Tillack J, et al. BRENDA in 2017: new perspectives and new tools in BRENDA. Nucleic Acids Res. 2017;45:D380–8.View ArticleGoogle Scholar
- TransportDB TransAAP. http://www.membranetransport.org/transportDB2/TransAAP_login.html. Accessed 26 Aug 2016.
- KEGG BRITE Transporters Database. http://www.genome.jp/kegg-bin/get_htext?ko02000.keg. Accessed 16 Aug 2017.