GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes
Abstract
Large-scale metagenomic datasets enable the recovery of hundreds of population genomes from environmental samples. However, these genomes do not typically represent the full diversity of complex microbial communities. Gene-centric approaches can be used to gain a comprehensive view of diversity by examining each read independently, but traditional pairwise comparison approaches typically over-classify taxonomy and scale poorly with increasing metagenome and database sizes. Here we introduce GraftM, a tool that uses gene specific packages to rapidly identify gene families in metagenomic data using hidden Markov models (HMMs) or DIAMOND databases, and classifies these sequences using placement into pre-constructed gene trees. The speed and accuracy of GraftM was benchmarked with in silico and in vitro mock communities using taxonomic markers, and was found to have higher accuracy at the family level with a processing time 2.0–3.7× faster than currently available software. Exploration of a wetland metagenome using 16S rRNA- and methyl-coenzyme M reductase (McrA)-specific gpkgs revealed taxonomic and functional shifts across a depth gradient. Analysis of the NCBI nr database using the McrA gpkg allowed the detection of novel sequences belonging to phylum-level lineages. A growing collection of gpkgs is available online (https://github.com/geronimp/graftM_gpkgs), where curated packages can be uploaded and exchanged.
- Authors:
-
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia
- Publication Date:
- Research Org.:
- Univ. of Arizona, Tucson, AZ (United States); The Ohio State Univ., Columbus, OH (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Biological and Environmental Research (BER)
- OSTI Identifier:
- 1439479
- Alternate Identifier(s):
- OSTI ID: 1502447
- Grant/Contract Number:
- SC0004632; SC0010580; SC0016440
- Resource Type:
- Published Article
- Journal Name:
- Nucleic Acids Research
- Additional Journal Information:
- Journal Name: Nucleic Acids Research Journal Volume: 46 Journal Issue: 10; Journal ID: ISSN 0305-1048
- Publisher:
- Oxford University Press
- Country of Publication:
- United Kingdom
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES
Citation Formats
Boyd, Joel A., Woodcroft, Ben J., and Tyson, Gene W. GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes. United Kingdom: N. p., 2018.
Web. doi:10.1093/nar/gky174.
Boyd, Joel A., Woodcroft, Ben J., & Tyson, Gene W. GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes. United Kingdom. https://doi.org/10.1093/nar/gky174
Boyd, Joel A., Woodcroft, Ben J., and Tyson, Gene W. Mon .
"GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes". United Kingdom. https://doi.org/10.1093/nar/gky174.
@article{osti_1439479,
title = {GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes},
author = {Boyd, Joel A. and Woodcroft, Ben J. and Tyson, Gene W.},
abstractNote = {Large-scale metagenomic datasets enable the recovery of hundreds of population genomes from environmental samples. However, these genomes do not typically represent the full diversity of complex microbial communities. Gene-centric approaches can be used to gain a comprehensive view of diversity by examining each read independently, but traditional pairwise comparison approaches typically over-classify taxonomy and scale poorly with increasing metagenome and database sizes. Here we introduce GraftM, a tool that uses gene specific packages to rapidly identify gene families in metagenomic data using hidden Markov models (HMMs) or DIAMOND databases, and classifies these sequences using placement into pre-constructed gene trees. The speed and accuracy of GraftM was benchmarked with in silico and in vitro mock communities using taxonomic markers, and was found to have higher accuracy at the family level with a processing time 2.0–3.7× faster than currently available software. Exploration of a wetland metagenome using 16S rRNA- and methyl-coenzyme M reductase (McrA)-specific gpkgs revealed taxonomic and functional shifts across a depth gradient. Analysis of the NCBI nr database using the McrA gpkg allowed the detection of novel sequences belonging to phylum-level lineages. A growing collection of gpkgs is available online (https://github.com/geronimp/graftM_gpkgs), where curated packages can be uploaded and exchanged.},
doi = {10.1093/nar/gky174},
journal = {Nucleic Acids Research},
number = 10,
volume = 46,
place = {United Kingdom},
year = {Mon Mar 19 00:00:00 EDT 2018},
month = {Mon Mar 19 00:00:00 EDT 2018}
}
https://doi.org/10.1093/nar/gky174
Web of Science
Figures / Tables:
Works referenced in this record:
Adaptive seeds tame genomic sequence comparison
journal, January 2011
- Kielbasa, S. M.; Wan, R.; Sato, K.
- Genome Research, Vol. 21, Issue 3
Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments
journal, February 2007
- von Mering, C.; Hugenholtz, P.; Raes, J.
- Science, Vol. 315, Issue 5815
Prokka: rapid prokaryotic genome annotation
journal, March 2014
- Seemann, T.
- Bioinformatics, Vol. 30, Issue 14
Tackling soil diversity with the assembly of large, complex metagenomes
journal, March 2014
- Howe, Adina Chuang; Jansson, Janet K.; Malfatti, Stephanie A.
- Proceedings of the National Academy of Sciences, Vol. 111, Issue 13
Using the Metagenomics RAST Server (MG-RAST) for Analyzing Shotgun Metagenomes
journal, January 2010
- Glass, E. M.; Wilkening, J.; Wilke, A.
- Cold Spring Harbor Protocols, Vol. 2010, Issue 1
Methane metabolism in the archaeal phylum Bathyarchaeota revealed by genome-centric metagenomics
journal, October 2015
- Evans, P. N.; Parks, D. H.; Chadwick, G. L.
- Science, Vol. 350, Issue 6259
Phylogeny-aware identification and correction of taxonomically mislabeled sequences
journal, May 2016
- Kozlov, Alexey M.; Zhang, Jiajie; Yilmaz, Pelin
- Nucleic Acids Research, Vol. 44, Issue 11
The Genome of M. acetivorans Reveals Extensive Metabolic and Physiological Diversity
journal, April 2002
- Galagan, J. E.
- Genome Research, Vol. 12, Issue 4
Phylogenetic classification of short environmental DNA fragments
journal, February 2008
- Krause, Lutz; Diaz, Naryttza N.; Goesmann, Alexander
- Nucleic Acids Research, Vol. 36, Issue 7
ARB: a software environment for sequence data
journal, February 2004
- Ludwig, W.
- Nucleic Acids Research, Vol. 32, Issue 4
Climate change and the permafrost carbon feedback
journal, April 2015
- Schuur, E. A. G.; McGuire, A. D.; Schädel, C.
- Nature, Vol. 520, Issue 7546
Validation of picogram- and femtogram-input DNA libraries for microscale metagenomics
journal, January 2016
- Rinke, Christian; Low, Serene; Woodcroft, Ben J.
- PeerJ, Vol. 4
Accelerated Profile HMM Searches
journal, October 2011
- Eddy, Sean R.
- PLoS Computational Biology, Vol. 7, Issue 10
Unusual biology across a group comprising more than 15% of domain Bacteria
journal, June 2015
- Brown, Christopher T.; Hug, Laura A.; Thomas, Brian C.
- Nature, Vol. 523, Issue 7559
Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison
journal, March 2013
- Matsen IV, Frederick A.; Evans, Steven N.
- PLoS ONE, Vol. 8, Issue 3
Fast and accurate short read alignment with Burrows-Wheeler transform
journal, May 2009
- Li, H.; Durbin, R.
- Bioinformatics, Vol. 25, Issue 14
Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes
journal, May 2013
- Albertsen, Mads; Hugenholtz, Philip; Skarshewski, Adam
- Nature Biotechnology, Vol. 31, Issue 6
Fermentation, Hydrogen, and Sulfur Metabolism in Multiple Uncultivated Bacterial Phyla
journal, September 2012
- Wrighton, K. C.; Thomas, B. C.; Sharon, I.
- Science, Vol. 337, Issue 6102
Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software
journal, October 2017
- Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter
- Nature Methods, Vol. 14, Issue 11
Decadal vegetation changes in a northern peatland, greenhouse gas fluxes and net radiative forcing
journal, December 2006
- Johansson, TorbjÖRn; Malmer, Nils; Crill, Patrick M.
- Global Change Biology, Vol. 12, Issue 12
An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea
journal, December 2011
- McDonald, Daniel; Price, Morgan N.; Goodrich, Julia
- The ISME Journal, Vol. 6, Issue 3
Community structure and metabolism through reconstruction of microbial genomes from the environment
journal, February 2004
- Tyson, Gene W.; Chapman, Jarrod; Hugenholtz, Philip
- Nature, Vol. 428, Issue 6978
Basic local alignment search tool
journal, October 1990
- Altschul, Stephen F.; Gish, Warren; Miller, Webb
- Journal of Molecular Biology, Vol. 215, Issue 3, p. 403-410
Treephyler: fast taxonomic profiling of metagenomes
journal, February 2010
- Schreiber, F.; Gumrich, P.; Daniel, R.
- Bioinformatics, Vol. 26, Issue 7
OrfM: a fast open reading frame predictor for metagenomic data
journal, May 2016
- Woodcroft, Ben J.; Boyd, Joel A.; Tyson, Gene W.
- Bioinformatics, Vol. 32, Issue 17
Methylotrophic methanogenesis discovered in the archaeal phylum Verstraetearchaeota
journal, October 2016
- Vanwonterghem, Inka; Evans, Paul N.; Parks, Donovan H.
- Nature Microbiology, Vol. 1, Issue 12
FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments
journal, March 2010
- Price, Morgan N.; Dehal, Paramvir S.; Arkin, Adam P.
- PLoS ONE, Vol. 5, Issue 3
RefSeq: an update on prokaryotic genome annotation and curation
journal, November 2017
- Haft, Daniel H.; DiCuccio, Michael; Badretdin, Azat
- Nucleic Acids Research, Vol. 46, Issue D1
PhyloSift: phylogenetic analysis of genomes and metagenomes
journal, January 2014
- Darling, Aaron E.; Jospin, Guillaume; Lowe, Eric
- PeerJ, Vol. 2
Hidden Markov Models in Computational Biology
journal, February 1994
- Krogh, Anders; Brown, Michael; Mian, I. Saira
- Journal of Molecular Biology, Vol. 235, Issue 5
Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum Likelihood
journal, March 2011
- Berger, Simon A.; Krompass, Denis; Stamatakis, Alexandros
- Systematic Biology, Vol. 60, Issue 3
IMG: the integrated microbial genomes database and comparative analysis system
journal, December 2011
- Markowitz, V. M.; Chen, I. -M. A.; Palaniappan, K.
- Nucleic Acids Research, Vol. 40, Issue D1
MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities
journal, January 2015
- Kang, Dongwan D.; Froula, Jeff; Egan, Rob
- PeerJ, Vol. 3
Integrative analysis of environmental sequences using MEGAN4
journal, June 2011
- Huson, D. H.; Mitra, S.; Ruscheweyh, H. -J.
- Genome Research, Vol. 21, Issue 9
pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree
journal, October 2010
- Matsen, Frederick A.; Kodner, Robin B.; Armbrust, E. Virginia
- BMC Bioinformatics, Vol. 11, Issue 1
Rapid identification of high-confidence taxonomic assignments for metagenomic data
journal, April 2012
- MacDonald, Norman J.; Parks, Donovan H.; Beiko, Robert G.
- Nucleic Acids Research, Vol. 40, Issue 14
CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes
journal, May 2015
- Parks, Donovan H.; Imelfort, Michael; Skennerton, Connor T.
- Genome Research, Vol. 25, Issue 7
MetAnnotate: function-specific taxonomic profiling and comparison of metagenomes
journal, November 2015
- Petrenko, Pavel; Lobb, Briallen; Kurtz, Daniel A.
- BMC Biology, Vol. 13, Issue 1
Fast and sensitive protein alignment using DIAMOND
journal, November 2014
- Buchfink, Benjamin; Xie, Chao; Huson, Daniel H.
- Nature Methods, Vol. 12, Issue 1
Interactive metagenomic visualization in a Web browser
journal, September 2011
- Ondov, Brian D.; Bergman, Nicholas H.; Phillippy, Adam M.
- BMC Bioinformatics, Vol. 12, Issue 1
MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm
journal, August 2014
- Wu, Yu-Wei; Tang, Yung-Hsu; Tringe, Susannah G.
- Microbiome, Vol. 2, Issue 1
MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability
journal, January 2013
- Katoh, K.; Standley, D. M.
- Molecular Biology and Evolution, Vol. 30, Issue 4
Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies
journal, December 2009
- Schnoes, Alexandra M.; Brown, Shoshana D.; Dodevski, Igor
- PLoS Computational Biology, Vol. 5, Issue 12
Works referencing / citing this record:
Methanotrophy across a natural permafrost thaw environment
journal, June 2018
- Singleton, Caitlin M.; McCalley, Carmody K.; Woodcroft, Ben J.
- The ISME Journal, Vol. 12, Issue 10
iMicrobe: Tools and data-driven discovery platform for the microbiome sciences
journal, July 2019
- Youens-Clark, Ken; Bomhoff, Matt; Ponsero, Alise J.
- GigaScience, Vol. 8, Issue 7
Distinct Taxonomic and Functional Profiles of the Microbiome Associated With Different Soil Horizons of a Moist Tussock Tundra in Alaska
journal, June 2019
- Tripathi, Binu M.; Kim1, Hye Min; Jung, Ji Young
- Frontiers in Microbiology, Vol. 10
Divergent methyl-coenzyme M reductase genes in a deep-subseafloor Archaeoglobi
journal, January 2019
- Boyd, Joel A.; Jungbluth, Sean P.; Leu, Andy O.
- The ISME Journal, Vol. 13, Issue 5
Anaerobic methane oxidation coupled to manganese reduction by members of the Methanoperedenaceae
journal, January 2020
- Leu, Andy O.; Cai, Chen; McIlroy, Simon J.
- The ISME Journal, Vol. 14, Issue 4
Characterization of a sponge microbiome using an integrative genome-centric approach
journal, January 2020
- Engelberts, J. Pamela; Robbins, Steven J.; de Goeij, Jasper M.
- The ISME Journal, Vol. 14, Issue 5
Metabolic potential of uncultured bacteria and archaea associated with petroleum seepage in deep-sea sediments
journal, April 2019
- Dong, Xiyang; Greening, Chris; Rattray, Jayne E.
- Nature Communications, Vol. 10, Issue 1
Insights into the ecological roles and evolution of methyl-coenzyme M reductase-containing hot spring Archaea
journal, October 2019
- Hua, Zheng-Shuang; Wang, Yu-Lin; Evans, Paul N.
- Nature Communications, Vol. 10, Issue 1
Bacterial fermentation and respiration processes are uncoupled in anoxic permeable sediments
journal, March 2019
- Kessler, Adam J.; Chen, Ya-Jou; Waite, David W.
- Nature Microbiology, Vol. 4, Issue 6
Defining the human gut host–phage network through single-cell viral tagging
journal, August 2019
- Džunková, Mária; Low, Soo Jen; Daly, Joshua N.
- Nature Microbiology, Vol. 4, Issue 12
A genomic view of the reef-building coral Porites lutea and its microbial symbionts
journal, September 2019
- Robbins, Steven J.; Singleton, Caitlin M.; Chan, Cheong Xin
- Nature Microbiology, Vol. 4, Issue 12
An evolving view of methane metabolism in the Archaea
journal, January 2019
- Evans, Paul N.; Boyd, Joel A.; Leu, Andy O.
- Nature Reviews Microbiology, Vol. 17, Issue 4
Genome-centric view of carbon processing in thawing permafrost
journal, July 2018
- Woodcroft, Ben J.; Singleton, Caitlin M.; Boyd, Joel A.
- Nature, Vol. 560, Issue 7716
PhyloMagnet: fast and accurate screening of short-read meta-omics data using gene-centric phylogenetics
journal, October 2019
- Schön, Max E.; Eme, Laura; Ettema, Thijs J. G.
- Bioinformatics
Heliorhodopsins are absent in diderm (Gram‐negative) bacteria: Some thoughts and possible implications for activity
journal, January 2019
- Flores‐Uribe, José; Hevroni, Gur; Ghai, Rohit
- Environmental Microbiology Reports, Vol. 11, Issue 3
Predominance of Anaerobic, Spore-Forming Bacteria in Metabolically Active Microbial Communities from Ancient Siberian Permafrost
journal, May 2019
- Liang, Renxing; Lau, Maggie; Vishnivetskaya, Tatiana
- Applied and Environmental Microbiology, Vol. 85, Issue 15
Draft Genome Sequence of “Candidatus Bathyarchaeota” Archaeon BE326-BA-RLH, an Uncultured Denitrifier and Putative Anaerobic Methanotroph from South Africa’s Deep Continental Biosphere
journal, November 2018
- Harris, Rachel L.; Lau, Maggie C. Y.; Cadar, Andreia
- Microbiology Resource Announcements, Vol. 7, Issue 20
Archaea dominate the microbial community in an ecosystem with low-to-moderate temperature and extreme acidity
journal, January 2019
- Korzhenkov, Aleksei A.; Toshchakov, Stepan V.; Bargiela, Rafael
- Microbiome, Vol. 7, Issue 1
Figures / Tables found in this record: