skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Likelihood-based gene annotations for gap filling and quality assessment in genome-scale metabolic models

Journal Article · · PLoS Computational Biology (Online)
 [1];  [2];  [3];  [4];  [5];  [6]
  1. Univ. of Illinois at Urbana-Champaign, Urbana, IL (United States). Dept. of Chemical and Biomolecular Engineering.
  2. Mayo Clinic, Rochester, MN (United States). Center for Individualized Medicine.
  3. Argonne National Lab. (ANL), Lement, IL (United States). Mathematics and Computer Science Division.
  4. Mayo Clinic, Rochester, MN (United States). Center for Individualized Medicine, Depts. of Surgery and Physiology and Bioengineering.
  5. Univ. of Illinois at Urbana-Champaign, Urbana, IL (United States). Dept. of Chemical and Biomolecular Engineering; Inst. for Systems Biology, Seattle, WA (United States)
  6. Pennsylvania State Univ., University Park, PA (US)

Genome-scale metabolic models provide a powerful means to harness information from genomes to deepen biological insights. With exponentially increasing sequencing capacity, there is an enormous need for automated reconstruction techniques that can provide more accurate models in a short time frame. Current methods for automated metabolic network reconstruction rely on gene and reaction annotations to build draft metabolic networks and algorithms to fill gaps in these networks. However, automated reconstruction is hampered by database inconsistencies, incorrect annotations, and gap filling largely without considering genomic information. Here we develop an approach for applying genomic information to predict alternative functions for genes and estimate their likelihoods from sequence homology. We show that computed likelihood values were significantly higher for annotations found in manually curated metabolic networks than those that were not. We then apply these alternative functional predictions to estimate reaction likelihoods, which are used in a new gap filling approach called likelihood-based gap filling to predict more genomically consistent solutions. To validate the likelihood-based gap filling approach, we applied it to models where essential pathways were removed, finding that likelihood-based gap filling identified more biologically relevant solutions than parsimony-based gap filling approaches. We also demonstrate that models gap filled using likelihood-based gap filling provide greater coverage and genomic consistency with metabolic gene functions compared to parsimony-based approaches. Interestingly, despite these findings, we found that likelihoods did not significantly affect consistency of gap filled models with Biolog and knockout lethality data. This indicates that the phenotype data alone cannot necessarily be used to discriminate between alternative solutions for gap filling and therefore, that the use of other information is necessary to obtain a more accurate network. All described workflows are implemented as part of the DOE Systems Biology Knowledgebase (KBase) and are publicly available via API or command-line web interface.

Research Organization:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
FG02-10ER64999; AC02-06CH11357
OSTI ID:
1212409
Journal Information:
PLoS Computational Biology (Online), Vol. 10, Issue 10; ISSN 1553-7358
Publisher:
Public Library of ScienceCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 46 works
Citation information provided by
Web of Science

References (66)

The Escherichia coli MG1655 in silico metabolic genotype: Its definition, characteristics, and capabilities journal May 2000
An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR) journal August 2003
Genome-Scale Metabolic Reconstruction and Hypothesis Testing in the Methanogenic Archaeon Methanosarcina acetivorans C2A journal December 2011
Reconstruction and Validation of Saccharomyces cerevisiae iND750, a Fully Compartmentalized Genome-Scale Metabolic Model journal June 2004
Accomplishments in genome-scale in silico modeling for industrial and medical biotechnology journal December 2009
Phylogenetic structure of the prokaryotic domain: The primary kingdoms journal November 1977
Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth journal November 2002
Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms journal January 2003
Adaptive evolution of bacterial metabolic networks by horizontal gene transfer journal November 2005
Metabolic engineering of Clostridium acetobutylicum M5 for highly selective butanol production journal October 2009
Metabolic network reconstruction and genome-scale model of butanol-producing strain Clostridium beijerinckii NCIMB 8052 journal January 2011
A protocol for generating a high-quality genome-scale metabolic reconstruction journal January 2010
A Catalog of Reference Genomes from the Human Microbiome journal May 2010
BiGG: a Biochemical Genetic and Genomic knowledgebase of large scale metabolic reconstructions journal January 2010
KEGG for integration and interpretation of large-scale molecular data sets journal November 2011
The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases journal October 2009
Toward the automated generation of genome-scale metabolic networks in the SEED journal January 2007
Optimization based automated curation of metabolic reconstructions journal January 2007
High-throughput generation, optimization and analysis of genome-scale metabolic models journal August 2010
The RAVEN Toolbox and Its Use for Generating a Genome-scale Metabolic Model for Penicillium chrysogenum journal March 2013
Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0 journal August 2011
Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies journal December 2009
Systematizing the generation of missing metabolic knowledge journal June 2010
iBsu1103: a new genome-scale metabolic model of Bacillus subtilis based on SEED annotations journal January 2009
Identification of Genome-Scale Metabolic Network Models Using Experimentally Measured Flux Profiles journal January 2006
GrowMatch: An Automated Method for Reconciling In Silico/In Vivo Growth Predictions journal March 2009
MIRAGE: a functional genomics-based approach for metabolic network model reconstruction and its application to cyanobacteria networks journal January 2012
Version 6 of the consensus yeast metabolic network refines biochemical coverage and improves model performance journal January 2013
Linking Genome-Scale Metabolic Modeling and Genome Annotation book January 2013
Genome-scale models of microbial cells: evaluating the consequences of constraints journal November 2004
Constraining the metabolic genotype–phenotype relationship using a phylogeny of in silico methods journal February 2012
Genome-scale Reconstruction of Metabolic Network in Bacillus subtilis Based on High-throughput Phenotyping and Gene Essentiality Data journal June 2007
Mevalonate and Nonmevalonate Pathways for the Biosynthesis of Isoprene Units journal January 2002
Biosynthesis of isoprenoids via the non-mevalonate pathway journal June 2004
STRING: a database of predicted functional associations between proteins journal January 2003
Global probabilistic annotation of metabolic networks enables enzyme discovery journal September 2012
In silico approaches to study mass and energy flows in microbial consortia: a syntrophic case study journal January 2009
The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes journal September 2005
Phylogenomics: Improving Functional Predictions for Uncharacterized Genes by Evolutionary Analysis journal March 1998
BLAST+: architecture and applications journal January 2009
Basic local alignment search tool journal October 1990
Thirteen Years of Building Constraint-Based In Silico Models of Escherichia coli journal May 2003
The effects of alternate optimal solutions in constraint-based genome-scale metabolic models journal October 2003
A genome-wide strategy for the identification of essential genes in Staphylococcus aureus journal March 2002
A genome‐scale metabolic reconstruction for Escherichia coli K‐12 MG1655 that accounts for 1260 ORFs and thermodynamic information journal January 2007
Genes required for mycobacterial growth defined by high density mutagenesis: Genes required for mycobacterial growth journal March 2003
A genome-scale analysis for identification of genes required for growth or survival of Haemophilus influenzae journal January 2002
Iterative reconstruction of a global metabolic model of Acinetobacter baylyi ADP1 using high-throughput growth phenotype and gene essentiality data journal January 2008
A comprehensive transposon mutant library of Francisella novicida , a bioweapon surrogate journal January 2007
Large-scale transposon mutagenesis of Mycoplasma pulmonis journal July 2008
Essential genes of a minimal bacterium journal January 2006
Essential Bacillus subtilis genes journal April 2003
Comprehensive transposon mutant library of Pseudomonas aeruginosa journal November 2003
Identification of 113 conserved essential genes using a high-throughput gene disruption system in Streptococcus pneumoniae journal July 2002
Identification of Critical Staphylococcal Genes Using Conditional Phenotypes Generated by Antisense RNA journal September 2001
What is flux balance analysis? journal March 2010
SCIP: solving constraint integer programs journal January 2009
Coriander Genomics Database: a genomic, transcriptomic, and metabolic database for coriander journal April 2020
The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases journal November 2011
Identification of Genome-Scale Metabolic Network Models Using Experimentally Measured Flux Profiles journal January 2005
Essential Bacillus subtilis genes text January 2003
Accomplishments in Genome-Scale In Silico Modeling for Industrial and Medical Biotechnology text January 2009
The non-mevalonate pathway of isoprenoids: genes, enzymes and intermediates journal October 2001
The Pathway Tools software journal July 2002
Comparative Genome-Scale Metabolic Reconstruction and Flux Balance Analysis of Multiple Staphylococcus aureus Genomes Identify Novel Antimicrobial Drug Targets journal April 2009
Global Transposon Mutagenesis and Essential Gene Analysis of Helicobacter pylori journal November 2004

Cited By (20)

Modelling approaches for studying the microbiome journal July 2019
Microbial bioinformatics for food safety and production journal June 2015
Gsmodutils: a python based framework for test-driven genome scale metabolic model development journal February 2019
Metabolic mechanisms of interaction within a defined gut microbiota posted_content January 2018
Guiding the Refinement of Biochemical Knowledgebases with Ensembles of Metabolic Networks and Machine Learning journal June 2019
The future of scientific workflows journal April 2017
Combining multiple functional annotation tools increases coverage of metabolic annotation journal December 2018
How accurate is automated gap filling of metabolic models? journal June 2018
Distinct microbes, metabolites, and ecologies define the microbiome in deficient and proficient mismatch repair colorectal cancers journal October 2018
Comparative Analysis of Yeast Metabolic Network Models Highlights Progress, Opportunities for Metabolic Reconstruction journal November 2015
Meneco, a Topology-Based Gap-Filling Tool Applicable to Degraded Genome-Wide Metabolic Networks journal January 2017
Data-driven integration of genome-scale regulatory and metabolic network models journal May 2015
PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database journal February 2016
Gsmodutils: A python based framework for test-driven genome scale metabolic model development journal September 2018
Discovering missing reactions of metabolic networks by using gene co-expression data journal February 2017
Mackinac: a bridge between ModelSEED and COBRApy to generate and analyze genome-scale metabolic models journal March 2017
Fast automated reconstruction of genome-scale metabolic models for microbial species and communities journal June 2018
Managing uncertainty in metabolic network structure and improving predictions using EnsembleFBA journal March 2017
From DNA to FBA: How to Build Your Own Genome-Scale Metabolic Model journal June 2016
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation journal November 2016