DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Predicting metabolic modules in incomplete bacterial genomes with MetaPathPredict

Journal Article · · eLife

The reconstruction of complete microbial metabolic pathways using ‘omics data from environmental samples remains challenging. Computational pipelines for pathway reconstruction that utilize machine learning methods to predict the presence or absence of KEGG modules in incomplete genomes are lacking. Here, we present MetaPathPredict, a software tool that incorporates machine learning models to predict the presence of complete KEGG modules within bacterial genomic datasets. Using gene annotation data and information from the KEGG module database, MetaPathPredict employs deep learning models to predict the presence of KEGG modules in a genome. MetaPathPredict can be used as a command line tool or as a Python module, and both options are designed to be run locally or on a compute cluster. Benchmarks show that MetaPathPredict makes robust predictions of KEGG module presence within highly incomplete genomes.

Sponsoring Organization:
USDOE
OSTI ID:
2345246
Alternate ID(s):
OSTI ID: 2345247; OSTI ID: 2446791; OSTI ID: 2510973; OSTI ID: 2559051
Journal Information:
eLife, Journal Name: eLife Vol. 13; ISSN 2050-084X
Publisher:
eLife Sciences Publications, Ltd.Copyright Statement
Country of Publication:
United States
Language:
English

References (25)

Machine Learning and Knowledge Discovery in Databases conference September 2011
Eukaryotic genomes from a global metagenomic dataset illuminate trophic modes and biogeography of ocean plankton preprint June 2022
BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences journal February 2016
A Parsimony Approach to Biological Pathway Reconstruction/Inference for Genomes and Metagenomes journal August 2009
Escher-FBA: a web application for interactive flux balance analysis journal September 2018
KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold journal November 2019
From Genomes to Phenotypes: Traitar, the Microbial Trait Analyzer journal December 2016
DRAM for distilling microbial metabolism to automate the curation of microbiome function journal August 2020
KEGG: Kyoto Encyclopedia of Genes and Genomes journal January 2000
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation journal November 2015
fastp: an ultra-fast all-in-one FASTQ preprocessor journal September 2018
A genomic catalog of Earth’s microbiomes journal November 2020
gapseq: informed prediction of bacterial metabolic pathways and reconstruction of accurate metabolic models journal March 2021
CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning journal July 2023
The dynamic trophic architecture of open-ocean protist communities revealed through machine-guided metatranscriptomics journal February 2022
METABOLIC: high-throughput profiling of microbial genomes for functional traits, metabolism, biogeochemistry, and community-scale functional networks journal February 2022
The MetaCyc database of metabolic pathways and enzymes journal October 2017
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing journal May 2012
Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea journal August 2017
Anvi’o: an advanced analysis and visualization platform for ‘omics data journal January 2015
GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy journal September 2021
Anodal tDCS Combined With Radial Nerve Stimulation Promotes Hand Motor Recovery in the Acute Phase After Ischemic Stroke journal January 2015
GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms journal February 2020
KEMET – A python tool for KEGG Module evaluation and microbial genome annotation expansion journal January 2022
Prodigal: prokaryotic gene recognition and translation initiation site identification journal March 2010