Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

DOE JGI Metagenome Workflow

Journal Article · · mSystems
The DOE Joint Genome Institute (JGI) Metagenome Workflow performs metagenome data processing, including assembly; structural, functional, and taxonomic annotation; and binning of metagenomic data sets that are subsequently included into the Integrated Microbial Genomes and Microbiomes (IMG/M) (I.-M. A. Chen, K. Chu, K. Palaniappan, A. Ratner, et al., Nucleic Acids Res, 49:D751–D763, 2021, https://doi.org/10.1093/nar/gkaa939) comparative analysis system and provided for download via the JGI data portal (https://genome.jgi.doe.gov/portal/). This workflow scales to run on thousands of metagenome samples per year, which can vary by the complexity of microbial communities and sequencing depth. Here, we describe the different tools, databases, and parameters used at different steps of the workflow to help with the interpretation of metagenome data available in IMG and to enable researchers to apply this workflow to their own data. We use 20 publicly available sediment metagenomes to illustrate the computing requirements for the different steps and highlight the typical results of data processing. The workflow modules for read filtering and metagenome assembly are available as a workflow description language (WDL) file (https://code.jgi.doe.gov/BFoster/jgi_meta_wdl). The workflow modules for annotation and binning are provided as a service to the user community at https://img.jgi.doe.gov/submit and require filling out the project and associated metadata descriptions in the Genomes OnLine Database (GOLD) (S. Mukherjee, D. Stamatis, J. Bertsch, G. Ovchinnikova, et al., Nucleic Acids Res, 49:D723–D733, 2021, https://doi.org/10.1093/nar/gkaa983).
Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
U.S. Department of Energy; USDOE Office of Science (SC)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1826557
Alternate ID(s):
OSTI ID: 1828334
Journal Information:
mSystems, Journal Name: mSystems Journal Issue: 3 Vol. 6; ISSN 2379-5077
Publisher:
American Society for MicrobiologyCopyright Statement
Country of Publication:
United States
Language:
English

References (36)

Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure journal November 2001
tRNAscan-SE: Searching for tRNA Genes in Genomic Sequences book January 2019
Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea journal August 2017
Metagenomic microbial community profiling using unique clade-specific marker genes journal June 2012
Species-level functional profiling of metagenomes and metatranscriptomes journal October 2018
Indigenous and contaminant microbes in ultradeep mines journal November 2003
Infernal 1.1: 100-fold faster RNA homology searches journal September 2013
Prokka: rapid prokaryotic genome annotation journal March 2014
MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph journal January 2015
GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database journal November 2019
The IMG/M data management and analysis system v.6.0: new tools and advanced capabilities journal October 2020
Genomes OnLine Database (GOLD) v.8: overview and updates journal November 2020
The NCBI Taxonomy database journal December 2011
TIGRFAMs and Genome Properties in 2013 journal November 2012
Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions journal April 2013
Expanded microbial genome coverage and improved protein family annotation in the COG database journal November 2014
The Pfam protein families database: towards a more sustainable future journal December 2015
KEGG: new perspectives on genomes, pathways, diseases and drugs journal November 2016
Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families journal November 2017
20 years of the SMART protein domain annotation resource journal October 2017
CATH: expanding the horizons of structure-based functional annotations for genome sequences journal November 2018
MGnify: the microbiome analysis resource in 2020 journal November 2019
Adaptive seeds tame genomic sequence comparison journal January 2011
CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes journal May 2015
metaSPAdes: a new versatile metagenomic assembler journal March 2017
Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes journal May 2018
Specific Ribosomal DNA Sequences from Diverse Environmental Settings Correlate with Experimental Contaminants journal August 1998
Analysis of Bacteria Contaminating Ultrapure Water in Industrial Systems journal April 2002
Prodigal: prokaryotic gene recognition and translation initiation site identification journal March 2010
CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats journal June 2007
Reagent and laboratory contamination can critically impact sequence-based microbiome analyses journal November 2014
Improved metagenomic analysis with Kraken 2 journal November 2019
Shotgun metagenomic analysis of microbial communities from the Loxahatchee nature preserve in the Florida Everglades journal January 2020
SqueezeMeta, A Highly Portable, Fully Automatic Metagenomic Analysis Pipeline journal January 2019
Shotgun metagenomic analysis of microbial communities from the Loxahatchee nature preserve in the Florida Everglades collection January 2020
MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities journal January 2015

Similar Records

IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes
Journal Article · Thu Oct 04 20:00:00 EDT 2018 · Nucleic Acids Research · OSTI ID:1542357

IMG/M 4 version of the integrated metagenome comparative analysis system
Journal Article · Tue Oct 15 20:00:00 EDT 2013 · Nucleic Acids Research · OSTI ID:1625530

IMG/M: integrated genome and metagenome comparative data analysis system
Journal Article · Wed Oct 12 20:00:00 EDT 2016 · Nucleic Acids Research · OSTI ID:1379657