skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes

Journal Article · · Nucleic Acids Research
DOI:https://doi.org/10.1093/nar/gky174· OSTI ID:1439479
 [1];  [1];  [1]
  1. Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia

Large-scale metagenomic datasets enable the recovery of hundreds of population genomes from environmental samples. However, these genomes do not typically represent the full diversity of complex microbial communities. Gene-centric approaches can be used to gain a comprehensive view of diversity by examining each read independently, but traditional pairwise comparison approaches typically over-classify taxonomy and scale poorly with increasing metagenome and database sizes. Here we introduce GraftM, a tool that uses gene specific packages to rapidly identify gene families in metagenomic data using hidden Markov models (HMMs) or DIAMOND databases, and classifies these sequences using placement into pre-constructed gene trees. The speed and accuracy of GraftM was benchmarked with in silico and in vitro mock communities using taxonomic markers, and was found to have higher accuracy at the family level with a processing time 2.0–3.7× faster than currently available software. Exploration of a wetland metagenome using 16S rRNA- and methyl-coenzyme M reductase (McrA)-specific gpkgs revealed taxonomic and functional shifts across a depth gradient. Analysis of the NCBI nr database using the McrA gpkg allowed the detection of novel sequences belonging to phylum-level lineages. A growing collection of gpkgs is available online (https://github.com/geronimp/graftM_gpkgs), where curated packages can be uploaded and exchanged.

Research Organization:
Univ. of Arizona, Tucson, AZ (United States); The Ohio State Univ., Columbus, OH (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
SC0004632; SC0010580; SC0016440
OSTI ID:
1439479
Alternate ID(s):
OSTI ID: 1502447
Journal Information:
Nucleic Acids Research, Journal Name: Nucleic Acids Research Vol. 46 Journal Issue: 10; ISSN 0305-1048
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United Kingdom
Language:
English
Citation Metrics:
Cited by: 69 works
Citation information provided by
Web of Science

References (43)

Adaptive seeds tame genomic sequence comparison journal January 2011
Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments journal February 2007
Prokka: rapid prokaryotic genome annotation journal March 2014
Tackling soil diversity with the assembly of large, complex metagenomes journal March 2014
Using the Metagenomics RAST Server (MG-RAST) for Analyzing Shotgun Metagenomes journal January 2010
Methane metabolism in the archaeal phylum Bathyarchaeota revealed by genome-centric metagenomics journal October 2015
Phylogeny-aware identification and correction of taxonomically mislabeled sequences journal May 2016
The Genome of M. acetivorans Reveals Extensive Metabolic and Physiological Diversity journal April 2002
Phylogenetic classification of short environmental DNA fragments journal February 2008
ARB: a software environment for sequence data journal February 2004
Climate change and the permafrost carbon feedback journal April 2015
Validation of picogram- and femtogram-input DNA libraries for microscale metagenomics journal January 2016
Accelerated Profile HMM Searches journal October 2011
Unusual biology across a group comprising more than 15% of domain Bacteria journal June 2015
Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison journal March 2013
Fast and accurate short read alignment with Burrows-Wheeler transform journal May 2009
Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes journal May 2013
Fermentation, Hydrogen, and Sulfur Metabolism in Multiple Uncultivated Bacterial Phyla journal September 2012
Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software journal October 2017
Decadal vegetation changes in a northern peatland, greenhouse gas fluxes and net radiative forcing journal December 2006
An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea journal December 2011
Community structure and metabolism through reconstruction of microbial genomes from the environment journal February 2004
Basic local alignment search tool journal October 1990
Treephyler: fast taxonomic profiling of metagenomes journal February 2010
OrfM: a fast open reading frame predictor for metagenomic data journal May 2016
Methylotrophic methanogenesis discovered in the archaeal phylum Verstraetearchaeota journal October 2016
FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments journal March 2010
RefSeq: an update on prokaryotic genome annotation and curation journal November 2017
PhyloSift: phylogenetic analysis of genomes and metagenomes journal January 2014
Hidden Markov Models in Computational Biology journal February 1994
Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum Likelihood journal March 2011
IMG: the integrated microbial genomes database and comparative analysis system journal December 2011
MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities journal January 2015
Integrative analysis of environmental sequences using MEGAN4 journal June 2011
pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree journal October 2010
Rapid identification of high-confidence taxonomic assignments for metagenomic data journal April 2012
CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes journal May 2015
MetAnnotate: function-specific taxonomic profiling and comparison of metagenomes journal November 2015
Fast and sensitive protein alignment using DIAMOND journal November 2014
Interactive metagenomic visualization in a Web browser journal September 2011
MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm journal August 2014
MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability journal January 2013
Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies journal December 2009

Cited By (18)

Methanotrophy across a natural permafrost thaw environment journal June 2018
iMicrobe: Tools and data-driven discovery platform for the microbiome sciences journal July 2019
Distinct Taxonomic and Functional Profiles of the Microbiome Associated With Different Soil Horizons of a Moist Tussock Tundra in Alaska journal June 2019
Divergent methyl-coenzyme M reductase genes in a deep-subseafloor Archaeoglobi journal January 2019
Anaerobic methane oxidation coupled to manganese reduction by members of the Methanoperedenaceae journal January 2020
Characterization of a sponge microbiome using an integrative genome-centric approach journal January 2020
Metabolic potential of uncultured bacteria and archaea associated with petroleum seepage in deep-sea sediments journal April 2019
Insights into the ecological roles and evolution of methyl-coenzyme M reductase-containing hot spring Archaea journal October 2019
Bacterial fermentation and respiration processes are uncoupled in anoxic permeable sediments journal March 2019
Defining the human gut host–phage network through single-cell viral tagging journal August 2019
A genomic view of the reef-building coral Porites lutea and its microbial symbionts journal September 2019
An evolving view of methane metabolism in the Archaea journal January 2019
Genome-centric view of carbon processing in thawing permafrost journal July 2018
PhyloMagnet: fast and accurate screening of short-read meta-omics data using gene-centric phylogenetics journal October 2019
Heliorhodopsins are absent in diderm (Gram‐negative) bacteria: Some thoughts and possible implications for activity journal January 2019
Predominance of Anaerobic, Spore-Forming Bacteria in Metabolically Active Microbial Communities from Ancient Siberian Permafrost journal May 2019
Draft Genome Sequence of “Candidatus Bathyarchaeota” Archaeon BE326-BA-RLH, an Uncultured Denitrifier and Putative Anaerobic Methanotroph from South Africa’s Deep Continental Biosphere journal November 2018
Archaea dominate the microbial community in an ecosystem with low-to-moderate temperature and extreme acidity journal January 2019

Figures / Tables (5)


Similar Records

HuMiChip: Development of a Functional Gene Array for the Study of Human Microbiomes
Technical Report · Mon May 17 00:00:00 EDT 2010 · OSTI ID:1439479

Rapid phylogenetic and functional classification of short genomic fragments with signature peptides
Journal Article · Tue Aug 28 00:00:00 EDT 2012 · BMC Research Notes · OSTI ID:1439479

Soil Viruses Are Underexplored Players in Ecosystem Carbon Processing
Journal Article · Tue Oct 02 00:00:00 EDT 2018 · mSystems · OSTI ID:1439479

Related Subjects