DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes

Abstract

Large-scale metagenomic datasets enable the recovery of hundreds of population genomes from environmental samples. However, these genomes do not typically represent the full diversity of complex microbial communities. Gene-centric approaches can be used to gain a comprehensive view of diversity by examining each read independently, but traditional pairwise comparison approaches typically over-classify taxonomy and scale poorly with increasing metagenome and database sizes. Here we introduce GraftM, a tool that uses gene specific packages to rapidly identify gene families in metagenomic data using hidden Markov models (HMMs) or DIAMOND databases, and classifies these sequences using placement into pre-constructed gene trees. The speed and accuracy of GraftM was benchmarked with in silico and in vitro mock communities using taxonomic markers, and was found to have higher accuracy at the family level with a processing time 2.0–3.7× faster than currently available software. Exploration of a wetland metagenome using 16S rRNA- and methyl-coenzyme M reductase (McrA)-specific gpkgs revealed taxonomic and functional shifts across a depth gradient. Analysis of the NCBI nr database using the McrA gpkg allowed the detection of novel sequences belonging to phylum-level lineages. A growing collection of gpkgs is available online (https://github.com/geronimp/graftM_gpkgs), where curated packages can be uploaded and exchanged.

Authors:
 [1];  [1];  [1]
  1. Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia
Publication Date:
Research Org.:
Univ. of Arizona, Tucson, AZ (United States); The Ohio State Univ., Columbus, OH (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
OSTI Identifier:
1439479
Alternate Identifier(s):
OSTI ID: 1502447
Grant/Contract Number:  
SC0004632; SC0010580; SC0016440
Resource Type:
Published Article
Journal Name:
Nucleic Acids Research
Additional Journal Information:
Journal Name: Nucleic Acids Research Journal Volume: 46 Journal Issue: 10; Journal ID: ISSN 0305-1048
Publisher:
Oxford University Press
Country of Publication:
United Kingdom
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES

Citation Formats

Boyd, Joel A., Woodcroft, Ben J., and Tyson, Gene W. GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes. United Kingdom: N. p., 2018. Web. doi:10.1093/nar/gky174.
Boyd, Joel A., Woodcroft, Ben J., & Tyson, Gene W. GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes. United Kingdom. https://doi.org/10.1093/nar/gky174
Boyd, Joel A., Woodcroft, Ben J., and Tyson, Gene W. Mon . "GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes". United Kingdom. https://doi.org/10.1093/nar/gky174.
@article{osti_1439479,
title = {GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes},
author = {Boyd, Joel A. and Woodcroft, Ben J. and Tyson, Gene W.},
abstractNote = {Large-scale metagenomic datasets enable the recovery of hundreds of population genomes from environmental samples. However, these genomes do not typically represent the full diversity of complex microbial communities. Gene-centric approaches can be used to gain a comprehensive view of diversity by examining each read independently, but traditional pairwise comparison approaches typically over-classify taxonomy and scale poorly with increasing metagenome and database sizes. Here we introduce GraftM, a tool that uses gene specific packages to rapidly identify gene families in metagenomic data using hidden Markov models (HMMs) or DIAMOND databases, and classifies these sequences using placement into pre-constructed gene trees. The speed and accuracy of GraftM was benchmarked with in silico and in vitro mock communities using taxonomic markers, and was found to have higher accuracy at the family level with a processing time 2.0–3.7× faster than currently available software. Exploration of a wetland metagenome using 16S rRNA- and methyl-coenzyme M reductase (McrA)-specific gpkgs revealed taxonomic and functional shifts across a depth gradient. Analysis of the NCBI nr database using the McrA gpkg allowed the detection of novel sequences belonging to phylum-level lineages. A growing collection of gpkgs is available online (https://github.com/geronimp/graftM_gpkgs), where curated packages can be uploaded and exchanged.},
doi = {10.1093/nar/gky174},
journal = {Nucleic Acids Research},
number = 10,
volume = 46,
place = {United Kingdom},
year = {Mon Mar 19 00:00:00 EDT 2018},
month = {Mon Mar 19 00:00:00 EDT 2018}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
https://doi.org/10.1093/nar/gky174

Citation Metrics:
Cited by: 69 works
Citation information provided by
Web of Science

Figures / Tables:

Figure 1 Figure 1: Schematic of the GraftM pipeline, outlining the create, search and classify stages.Within the search step, the red arrow indicates the amino acid pipeline and the blue arrow indicates the nucleic acid pipeline.

Save / Share:

Works referenced in this record:

Adaptive seeds tame genomic sequence comparison
journal, January 2011


Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments
journal, February 2007


Prokka: rapid prokaryotic genome annotation
journal, March 2014


Tackling soil diversity with the assembly of large, complex metagenomes
journal, March 2014

  • Howe, Adina Chuang; Jansson, Janet K.; Malfatti, Stephanie A.
  • Proceedings of the National Academy of Sciences, Vol. 111, Issue 13
  • DOI: 10.1073/pnas.1402564111

Using the Metagenomics RAST Server (MG-RAST) for Analyzing Shotgun Metagenomes
journal, January 2010

  • Glass, E. M.; Wilkening, J.; Wilke, A.
  • Cold Spring Harbor Protocols, Vol. 2010, Issue 1
  • DOI: 10.1101/pdb.prot5368

Methane metabolism in the archaeal phylum Bathyarchaeota revealed by genome-centric metagenomics
journal, October 2015


Phylogeny-aware identification and correction of taxonomically mislabeled sequences
journal, May 2016

  • Kozlov, Alexey M.; Zhang, Jiajie; Yilmaz, Pelin
  • Nucleic Acids Research, Vol. 44, Issue 11
  • DOI: 10.1093/nar/gkw396

The Genome of M. acetivorans Reveals Extensive Metabolic and Physiological Diversity
journal, April 2002


Phylogenetic classification of short environmental DNA fragments
journal, February 2008

  • Krause, Lutz; Diaz, Naryttza N.; Goesmann, Alexander
  • Nucleic Acids Research, Vol. 36, Issue 7
  • DOI: 10.1093/nar/gkn038

ARB: a software environment for sequence data
journal, February 2004


Climate change and the permafrost carbon feedback
journal, April 2015

  • Schuur, E. A. G.; McGuire, A. D.; Schädel, C.
  • Nature, Vol. 520, Issue 7546
  • DOI: 10.1038/nature14338

Validation of picogram- and femtogram-input DNA libraries for microscale metagenomics
journal, January 2016


Accelerated Profile HMM Searches
journal, October 2011


Unusual biology across a group comprising more than 15% of domain Bacteria
journal, June 2015

  • Brown, Christopher T.; Hug, Laura A.; Thomas, Brian C.
  • Nature, Vol. 523, Issue 7559
  • DOI: 10.1038/nature14486

Fast and accurate short read alignment with Burrows-Wheeler transform
journal, May 2009


Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes
journal, May 2013

  • Albertsen, Mads; Hugenholtz, Philip; Skarshewski, Adam
  • Nature Biotechnology, Vol. 31, Issue 6
  • DOI: 10.1038/nbt.2579

Fermentation, Hydrogen, and Sulfur Metabolism in Multiple Uncultivated Bacterial Phyla
journal, September 2012


Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software
journal, October 2017

  • Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter
  • Nature Methods, Vol. 14, Issue 11
  • DOI: 10.1038/nmeth.4458

Decadal vegetation changes in a northern peatland, greenhouse gas fluxes and net radiative forcing
journal, December 2006


An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea
journal, December 2011

  • McDonald, Daniel; Price, Morgan N.; Goodrich, Julia
  • The ISME Journal, Vol. 6, Issue 3
  • DOI: 10.1038/ismej.2011.139

Community structure and metabolism through reconstruction of microbial genomes from the environment
journal, February 2004

  • Tyson, Gene W.; Chapman, Jarrod; Hugenholtz, Philip
  • Nature, Vol. 428, Issue 6978
  • DOI: 10.1038/nature02340

Basic local alignment search tool
journal, October 1990

  • Altschul, Stephen F.; Gish, Warren; Miller, Webb
  • Journal of Molecular Biology, Vol. 215, Issue 3, p. 403-410
  • DOI: 10.1016/S0022-2836(05)80360-2

Treephyler: fast taxonomic profiling of metagenomes
journal, February 2010


OrfM: a fast open reading frame predictor for metagenomic data
journal, May 2016


Methylotrophic methanogenesis discovered in the archaeal phylum Verstraetearchaeota
journal, October 2016


FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments
journal, March 2010


RefSeq: an update on prokaryotic genome annotation and curation
journal, November 2017

  • Haft, Daniel H.; DiCuccio, Michael; Badretdin, Azat
  • Nucleic Acids Research, Vol. 46, Issue D1
  • DOI: 10.1093/nar/gkx1068

PhyloSift: phylogenetic analysis of genomes and metagenomes
journal, January 2014

  • Darling, Aaron E.; Jospin, Guillaume; Lowe, Eric
  • PeerJ, Vol. 2
  • DOI: 10.7717/peerj.243

Hidden Markov Models in Computational Biology
journal, February 1994

  • Krogh, Anders; Brown, Michael; Mian, I. Saira
  • Journal of Molecular Biology, Vol. 235, Issue 5
  • DOI: 10.1006/jmbi.1994.1104

Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum Likelihood
journal, March 2011

  • Berger, Simon A.; Krompass, Denis; Stamatakis, Alexandros
  • Systematic Biology, Vol. 60, Issue 3
  • DOI: 10.1093/sysbio/syr010

IMG: the integrated microbial genomes database and comparative analysis system
journal, December 2011

  • Markowitz, V. M.; Chen, I. -M. A.; Palaniappan, K.
  • Nucleic Acids Research, Vol. 40, Issue D1
  • DOI: 10.1093/nar/gkr1044

MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities
journal, January 2015


Integrative analysis of environmental sequences using MEGAN4
journal, June 2011

  • Huson, D. H.; Mitra, S.; Ruscheweyh, H. -J.
  • Genome Research, Vol. 21, Issue 9
  • DOI: 10.1101/gr.120618.111

pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree
journal, October 2010

  • Matsen, Frederick A.; Kodner, Robin B.; Armbrust, E. Virginia
  • BMC Bioinformatics, Vol. 11, Issue 1
  • DOI: 10.1186/1471-2105-11-538

Rapid identification of high-confidence taxonomic assignments for metagenomic data
journal, April 2012

  • MacDonald, Norman J.; Parks, Donovan H.; Beiko, Robert G.
  • Nucleic Acids Research, Vol. 40, Issue 14
  • DOI: 10.1093/nar/gks335

CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes
journal, May 2015

  • Parks, Donovan H.; Imelfort, Michael; Skennerton, Connor T.
  • Genome Research, Vol. 25, Issue 7
  • DOI: 10.1101/gr.186072.114

MetAnnotate: function-specific taxonomic profiling and comparison of metagenomes
journal, November 2015


Fast and sensitive protein alignment using DIAMOND
journal, November 2014

  • Buchfink, Benjamin; Xie, Chao; Huson, Daniel H.
  • Nature Methods, Vol. 12, Issue 1
  • DOI: 10.1038/nmeth.3176

Interactive metagenomic visualization in a Web browser
journal, September 2011

  • Ondov, Brian D.; Bergman, Nicholas H.; Phillippy, Adam M.
  • BMC Bioinformatics, Vol. 12, Issue 1
  • DOI: 10.1186/1471-2105-12-385

MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm
journal, August 2014


MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability
journal, January 2013

  • Katoh, K.; Standley, D. M.
  • Molecular Biology and Evolution, Vol. 30, Issue 4
  • DOI: 10.1093/molbev/mst010

Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies
journal, December 2009


Works referencing / citing this record:

Methanotrophy across a natural permafrost thaw environment
journal, June 2018

  • Singleton, Caitlin M.; McCalley, Carmody K.; Woodcroft, Ben J.
  • The ISME Journal, Vol. 12, Issue 10
  • DOI: 10.1038/s41396-018-0065-5

iMicrobe: Tools and data-driven discovery platform for the microbiome sciences
journal, July 2019


Distinct Taxonomic and Functional Profiles of the Microbiome Associated With Different Soil Horizons of a Moist Tussock Tundra in Alaska
journal, June 2019


Divergent methyl-coenzyme M reductase genes in a deep-subseafloor Archaeoglobi
journal, January 2019


Anaerobic methane oxidation coupled to manganese reduction by members of the Methanoperedenaceae
journal, January 2020


Characterization of a sponge microbiome using an integrative genome-centric approach
journal, January 2020

  • Engelberts, J. Pamela; Robbins, Steven J.; de Goeij, Jasper M.
  • The ISME Journal, Vol. 14, Issue 5
  • DOI: 10.1038/s41396-020-0591-9

Metabolic potential of uncultured bacteria and archaea associated with petroleum seepage in deep-sea sediments
journal, April 2019


Insights into the ecological roles and evolution of methyl-coenzyme M reductase-containing hot spring Archaea
journal, October 2019


Bacterial fermentation and respiration processes are uncoupled in anoxic permeable sediments
journal, March 2019


Defining the human gut host–phage network through single-cell viral tagging
journal, August 2019


A genomic view of the reef-building coral Porites lutea and its microbial symbionts
journal, September 2019

  • Robbins, Steven J.; Singleton, Caitlin M.; Chan, Cheong Xin
  • Nature Microbiology, Vol. 4, Issue 12
  • DOI: 10.1038/s41564-019-0532-4

An evolving view of methane metabolism in the Archaea
journal, January 2019


Genome-centric view of carbon processing in thawing permafrost
journal, July 2018


PhyloMagnet: fast and accurate screening of short-read meta-omics data using gene-centric phylogenetics
journal, October 2019


Heliorhodopsins are absent in diderm (Gram‐negative) bacteria: Some thoughts and possible implications for activity
journal, January 2019

  • Flores‐Uribe, José; Hevroni, Gur; Ghai, Rohit
  • Environmental Microbiology Reports, Vol. 11, Issue 3
  • DOI: 10.1111/1758-2229.12730

Predominance of Anaerobic, Spore-Forming Bacteria in Metabolically Active Microbial Communities from Ancient Siberian Permafrost
journal, May 2019

  • Liang, Renxing; Lau, Maggie; Vishnivetskaya, Tatiana
  • Applied and Environmental Microbiology, Vol. 85, Issue 15
  • DOI: 10.1128/aem.00560-19

Archaea dominate the microbial community in an ecosystem with low-to-moderate temperature and extreme acidity
journal, January 2019


Figures/Tables have been extracted from DOE-funded journal article accepted manuscripts.