skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes

Abstract

Large-scale metagenomic datasets enable the recovery of hundreds of population genomes from environmental samples. However, these genomes do not typically represent the full diversity of complex microbial communities. Gene-centric approaches can be used to gain a comprehensive view of diversity by examining each read independently, but traditional pairwise comparison approaches typically over-classify taxonomy and scale poorly with increasing metagenome and database sizes. Here we introduce GraftM, a tool that uses gene specific packages to rapidly identify gene families in metagenomic data using hidden Markov models (HMMs) or DIAMOND databases, and classifies these sequences using placement into pre-constructed gene trees. The speed and accuracy of GraftM was benchmarked with in silico and in vitro mock communities using taxonomic markers, and was found to have higher accuracy at the family level with a processing time 2.0–3.7× faster than currently available software. Exploration of a wetland metagenome using 16S rRNA- and methyl-coenzyme M reductase (McrA)-specific gpkgs revealed taxonomic and functional shifts across a depth gradient. Analysis of the NCBI nr database using the McrA gpkg allowed the detection of novel sequences belonging to phylum-level lineages. A growing collection of gpkgs is available online (https://github.com/geronimp/graftM_gpkgs), where curated packages can be uploaded and exchanged.

Authors:
 [1];  [1];  [1]
  1. Univ. of Queensland, Queensland (Australia). Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences
Publication Date:
Research Org.:
Univ. of Arizona, Tucson, AZ (United States); The Ohio State Univ., Columbus, OH (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23)
OSTI Identifier:
1439479
Alternate Identifier(s):
OSTI ID: 1502447
Grant/Contract Number:  
SC0004632; SC0010580; SC0016440
Resource Type:
Journal Article: Published Article
Journal Name:
Nucleic Acids Research
Additional Journal Information:
Journal Volume: 46; Journal Issue: 10; Journal ID: ISSN 0305-1048
Publisher:
Oxford University Press
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES

Citation Formats

Boyd, Joel A., Woodcroft, Ben J., and Tyson, Gene W. GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes. United States: N. p., 2018. Web. doi:10.1093/nar/gky174.
Boyd, Joel A., Woodcroft, Ben J., & Tyson, Gene W. GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes. United States. doi:10.1093/nar/gky174.
Boyd, Joel A., Woodcroft, Ben J., and Tyson, Gene W. Mon . "GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes". United States. doi:10.1093/nar/gky174.
@article{osti_1439479,
title = {GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes},
author = {Boyd, Joel A. and Woodcroft, Ben J. and Tyson, Gene W.},
abstractNote = {Large-scale metagenomic datasets enable the recovery of hundreds of population genomes from environmental samples. However, these genomes do not typically represent the full diversity of complex microbial communities. Gene-centric approaches can be used to gain a comprehensive view of diversity by examining each read independently, but traditional pairwise comparison approaches typically over-classify taxonomy and scale poorly with increasing metagenome and database sizes. Here we introduce GraftM, a tool that uses gene specific packages to rapidly identify gene families in metagenomic data using hidden Markov models (HMMs) or DIAMOND databases, and classifies these sequences using placement into pre-constructed gene trees. The speed and accuracy of GraftM was benchmarked with in silico and in vitro mock communities using taxonomic markers, and was found to have higher accuracy at the family level with a processing time 2.0–3.7× faster than currently available software. Exploration of a wetland metagenome using 16S rRNA- and methyl-coenzyme M reductase (McrA)-specific gpkgs revealed taxonomic and functional shifts across a depth gradient. Analysis of the NCBI nr database using the McrA gpkg allowed the detection of novel sequences belonging to phylum-level lineages. A growing collection of gpkgs is available online (https://github.com/geronimp/graftM_gpkgs), where curated packages can be uploaded and exchanged.},
doi = {10.1093/nar/gky174},
journal = {Nucleic Acids Research},
issn = {0305-1048},
number = 10,
volume = 46,
place = {United States},
year = {2018},
month = {3}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record at 10.1093/nar/gky174

Citation Metrics:
Cited by: 3 works
Citation information provided by
Web of Science

Save / Share: