DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes

Abstract

Shotgun metagenomics has greatly advanced our understanding of microbial communities over the last decade. Metagenomic analyses often include assembly and genome binning, computationally daunting tasks especially for big data from complex environments such as soil and sediments. In many studies, however, only a subset of genes and pathways involved in specific functions are of interest; thus, it is not necessary to attempt global assembly. In addition, methods that target genes can be computationally more efficient and produce more accurate assembly by leveraging rich databases, especially for those genes that are of broad interest such as those involved in biogeochemical cycles, biodegradation, and antibiotic resistance or used as phylogenetic markers. Here, we review six gene-targeted assemblers with unique algorithms for extracting and/or assembling targeted genes: Xander, MegaGTA, SAT-Assembler, HMM-GRASPx, GenSeed-HMM, and MEGAN. We tested these tools using two datasets with known genomes, a synthetic community of artificial reads derived from the genomes of 17 bacteria, shotgun sequence data from a mock community with 48 bacteria and 16 archaea genomes, and a large soil shotgun metagenomic dataset. We compared assemblies of a universal single copy gene (rplB) and two N cycle genes (nifH and nirK). We measured their computational efficiency, sensitivity, specificity,more » and chimera rate and found Xander and MegaGTA, which both use a probabilistic graph structure to model the genes, have the best overall performance with all three datasets, although MEGAN, a reference matching assembler, had better sensitivity with synthetic and mock community members chosen from its reference collection. Also, Xander and MegaGTA are the only tools that include post-assembly scripts tuned for common molecular ecology and diversity analyses. Additionally, we provide a mathematical model for estimating the probability of assembling targeted genes in a metagenome for estimating required sequencing depth.« less

Authors:
 [1];  [1];  [2];  [1];  [3];  [1];  [1]
  1. Michigan State Univ., East Lansing, MI (United States). Center for Microbial Ecology
  2. City Univ. of Hong Kong (Hong Kong). Dept. of Electronical Engineering
  3. Univ. of California, Davis, CA (United States). Dept. of Population Health and Reproduction
Publication Date:
Research Org.:
Univ. of Wisconsin, Madison, WI (United States); Michigan State Univ., East Lansing, MI (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
OSTI Identifier:
1799680
Grant/Contract Number:  
FC02-07ER64494; FG02-99ER62848
Resource Type:
Accepted Manuscript
Journal Name:
Frontiers in Genetics
Additional Journal Information:
Journal Volume: 10; Journal ID: ISSN 1664-8021
Publisher:
Frontiers Media S.A.
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; Genetics & Heredity; gene-targeted assembly; microbial ecology; gene-centric assembly; Xander; MegaGTA

Citation Formats

Guo, Jiarong, Quensen, John F., Sun, Yanni, Wang, Qiong, Brown, C. Titus, Cole, James R., and Tiedje, James M. Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes. United States: N. p., 2019. Web. doi:10.3389/fgene.2019.00957.
Guo, Jiarong, Quensen, John F., Sun, Yanni, Wang, Qiong, Brown, C. Titus, Cole, James R., & Tiedje, James M. Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes. United States. https://doi.org/10.3389/fgene.2019.00957
Guo, Jiarong, Quensen, John F., Sun, Yanni, Wang, Qiong, Brown, C. Titus, Cole, James R., and Tiedje, James M. Tue . "Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes". United States. https://doi.org/10.3389/fgene.2019.00957. https://www.osti.gov/servlets/purl/1799680.
@article{osti_1799680,
title = {Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes},
author = {Guo, Jiarong and Quensen, John F. and Sun, Yanni and Wang, Qiong and Brown, C. Titus and Cole, James R. and Tiedje, James M.},
abstractNote = {Shotgun metagenomics has greatly advanced our understanding of microbial communities over the last decade. Metagenomic analyses often include assembly and genome binning, computationally daunting tasks especially for big data from complex environments such as soil and sediments. In many studies, however, only a subset of genes and pathways involved in specific functions are of interest; thus, it is not necessary to attempt global assembly. In addition, methods that target genes can be computationally more efficient and produce more accurate assembly by leveraging rich databases, especially for those genes that are of broad interest such as those involved in biogeochemical cycles, biodegradation, and antibiotic resistance or used as phylogenetic markers. Here, we review six gene-targeted assemblers with unique algorithms for extracting and/or assembling targeted genes: Xander, MegaGTA, SAT-Assembler, HMM-GRASPx, GenSeed-HMM, and MEGAN. We tested these tools using two datasets with known genomes, a synthetic community of artificial reads derived from the genomes of 17 bacteria, shotgun sequence data from a mock community with 48 bacteria and 16 archaea genomes, and a large soil shotgun metagenomic dataset. We compared assemblies of a universal single copy gene (rplB) and two N cycle genes (nifH and nirK). We measured their computational efficiency, sensitivity, specificity, and chimera rate and found Xander and MegaGTA, which both use a probabilistic graph structure to model the genes, have the best overall performance with all three datasets, although MEGAN, a reference matching assembler, had better sensitivity with synthetic and mock community members chosen from its reference collection. Also, Xander and MegaGTA are the only tools that include post-assembly scripts tuned for common molecular ecology and diversity analyses. Additionally, we provide a mathematical model for estimating the probability of assembling targeted genes in a metagenome for estimating required sequencing depth.},
doi = {10.3389/fgene.2019.00957},
journal = {Frontiers in Genetics},
number = ,
volume = 10,
place = {United States},
year = {Tue Oct 15 00:00:00 EDT 2019},
month = {Tue Oct 15 00:00:00 EDT 2019}
}

Works referenced in this record:

UCHIME improves sensitivity and speed of chimera detection
journal, June 2011


Use of profile hidden Markov models in viral discovery: current insights
journal, January 2017

  • Reyes, Alejandro; P. Alves, João Marcelo; Durham, Alan Mitchell
  • Advances in Genomics and Genetics, Vol. Volume 7
  • DOI: 10.2147/AGG.S136574

SEQAID: a DNA sequence assembling program based on a mathematical model
journal, January 1984

  • Peltola, Hannu; Söderlund, Hans; Ukkonen, Esko
  • Nucleic Acids Research, Vol. 12, Issue 1Part1
  • DOI: 10.1093/nar/12.1Part1.307

MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs
journal, October 2017


GRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly
journal, August 2016


Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software
journal, October 2017

  • Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter
  • Nature Methods, Vol. 14, Issue 11
  • DOI: 10.1038/nmeth.4458

MEGAN Community Edition - Interactive Exploration and Analysis of Large-Scale Microbiome Sequencing Data
journal, June 2016


ECOD: new developments in the evolutionary classification of domains
journal, November 2016

  • Schaeffer, R. Dustin; Liao, Yuxing; Cheng, Hua
  • Nucleic Acids Research, Vol. 45, Issue D1
  • DOI: 10.1093/nar/gkw1137

A human gut microbial gene catalogue established by metagenomic sequencing
journal, March 2010

  • Qin, Junjie; Li, Ruiqiang; Raes, Jeroen
  • Nature, Vol. 464, Issue 7285
  • DOI: 10.1038/nature08821

MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
journal, January 2015


Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
journal, September 1997

  • Altschul, Stephen F.; Madden, Thomas L.; Schäffer, Alejandro A.
  • Nucleic Acids Research, Vol. 25, Issue 17, p. 3389-3402
  • DOI: 10.1093/nar/25.17.3389

metaSPAdes: a new versatile metagenomic assembler
journal, March 2017

  • Nurk, Sergey; Meleshko, Dmitry; Korobeynikov, Anton
  • Genome Research, Vol. 27, Issue 5
  • DOI: 10.1101/gr.213959.116

De novo assembly of human genomes with massively parallel short read sequencing
journal, December 2009


A history of DNA sequence assembly
journal, January 2016


CAP3: A DNA Sequence Assembly Program
journal, September 1999


MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads
journal, July 2012

  • Namiki, Toshiaki; Hachiya, Tsuyoshi; Tanaka, Hideaki
  • Nucleic Acids Research, Vol. 40, Issue 20
  • DOI: 10.1093/nar/gks678

Finding the K Shortest Loopless Paths in a Network
journal, July 1971


ROCker: accurate detection and quantification of target genes in short-read metagenomic data sets by modeling sliding-window bitscores
journal, October 2016

  • Orellana, Luis H.; Rodriguez-R, Luis M.; Konstantinidis, Konstantinos T.
  • Nucleic Acids Research
  • DOI: 10.1093/nar/gkw900

Fast and sensitive protein alignment using DIAMOND
journal, November 2014

  • Buchfink, Benjamin; Xie, Chao; Huson, Daniel H.
  • Nature Methods, Vol. 12, Issue 1
  • DOI: 10.1038/nmeth.3176

Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities: Metagenomic and rRNA diversity characterization
journal, February 2013

  • Shakya, Migun; Quince, Christopher; Campbell, James H.
  • Environmental Microbiology, Vol. 15, Issue 6
  • DOI: 10.1111/1462-2920.12086

The Theory and Practice of Genome Sequence Assembly
journal, August 2015


A Scalable and Accurate Targeted Gene Assembly Tool (SAT-Assembler) for Next-Generation Sequencing Data
journal, August 2014


phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data
journal, April 2013


IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth
journal, April 2012


Tackling soil diversity with the assembly of large, complex metagenomes
journal, March 2014

  • Howe, Adina Chuang; Jansson, Janet K.; Malfatti, Stephanie A.
  • Proceedings of the National Academy of Sciences, Vol. 111, Issue 13
  • DOI: 10.1073/pnas.1402564111

Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies
journal, August 2012

  • Klindworth, Anna; Pruesse, Elmar; Schweer, Timmy
  • Nucleic Acids Research, Vol. 41, Issue 1
  • DOI: 10.1093/nar/gks808

Nonpareil: a redundancy-based approach to assess the level of coverage in metagenomic datasets
journal, October 2013


Unexpected nondenitrifier nitrous oxide reductase gene diversity and abundance in soils
journal, November 2012

  • Sanford, R. A.; Wagner, D. D.; Wu, Q.
  • Proceedings of the National Academy of Sciences, Vol. 109, Issue 48
  • DOI: 10.1073/pnas.1211238109

FragGeneScan: predicting genes in short and error-prone reads
journal, August 2010

  • Rho, Mina; Tang, Haixu; Ye, Yuzhen
  • Nucleic Acids Research, Vol. 38, Issue 20
  • DOI: 10.1093/nar/gkq747

The Pfam protein families database: towards a more sustainable future
journal, December 2015

  • Finn, Robert D.; Coggill, Penelope; Eberhardt, Ruth Y.
  • Nucleic Acids Research, Vol. 44, Issue D1
  • DOI: 10.1093/nar/gkv1344

Nonpareil 3: Fast Estimation of Metagenomic Coverage and Sequence Diversity
journal, April 2018


FunGene: the functional gene pipeline and repository
journal, January 2013


Xander: employing a novel method for efficient gene-targeted metagenomic assembly
journal, August 2015


Structure and function of the global ocean microbiome
journal, May 2015


Microbial Community Analysis with Ribosomal Gene Fragments from Shotgun Metagenomes
journal, October 2015

  • Guo, Jiarong; Cole, James R.; Zhang, Qingpeng
  • Applied and Environmental Microbiology, Vol. 82, Issue 1
  • DOI: 10.1128/AEM.02772-15

Efficient de novo assembly of large genomes using compressed data structures
journal, December 2011


A new Generation of Homology Search Tools Based on Probabilistic Inference
conference, March 2012


Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities
journal, October 2009

  • Schloss, P. D.; Westcott, S. L.; Ryabin, T.
  • Applied and Environmental Microbiology, Vol. 75, Issue 23, p. 7537-7541
  • DOI: 10.1128/AEM.01541-09

The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome
journal, July 2012

  • McDonald, Daniel; Clemente, Jose C.; Kuczynski, Justin
  • GigaScience, Vol. 1, Issue 1
  • DOI: 10.1186/2047-217X-1-7

QIIME allows analysis of high-throughput community sequencing data
journal, April 2010

  • Caporaso, J. Gregory; Kuczynski, Justin; Stombaugh, Jesse
  • Nature Methods, Vol. 7, Issue 5
  • DOI: 10.1038/nmeth.f.303

SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler
journal, December 2012


Genomic mapping by fingerprinting random clones: A mathematical analysis
journal, April 1988


Snowball : strain aware gene assembly of metagenomes
journal, September 2016


ABySS: A parallel assembler for short read sequence data
journal, February 2009


Scaling laws predict global microbial diversity
journal, May 2016

  • Locey, Kenneth J.; Lennon, Jay T.
  • Proceedings of the National Academy of Sciences, Vol. 113, Issue 21
  • DOI: 10.1073/pnas.1521291113

GenSeed-HMM: A Tool for Progressive Assembly Using Profile HMMs as Seeds and its Application in Alpavirinae Viral Discovery from Metagenomic Data
journal, March 2016

  • Alves, João M. P.; de Oliveira, André L.; Sandberg, Tatiana O. M.
  • Frontiers in Microbiology, Vol. 7
  • DOI: 10.3389/fmicb.2016.00269

Grinder: a versatile amplicon and shotgun sequence simulator
journal, March 2012

  • Angly, Florent E.; Willner, Dana; Rohwer, Forest
  • Nucleic Acids Research, Vol. 40, Issue 12
  • DOI: 10.1093/nar/gks251

Scaling metagenome sequence assembly with probabilistic de Bruijn graphs
journal, July 2012

  • Pell, J.; Hintze, A.; Canino-Koning, R.
  • Proceedings of the National Academy of Sciences, Vol. 109, Issue 33
  • DOI: 10.1073/pnas.1121464109

Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads
journal, January 2017


SFA-SPA: a suffix array based short peptide assembler for metagenomic data
journal, January 2015


A Formal Basis for the Heuristic Determination of Minimum Cost Paths
journal, January 1968

  • Hart, Peter; Nilsson, Nils; Raphael, Bertram
  • IEEE Transactions on Systems Science and Cybernetics, Vol. 4, Issue 2
  • DOI: 10.1109/TSSC.1968.300136

A Space-Efficient Construction of the Burrows–Wheeler Transform for Genomic Data
journal, September 2005

  • Lippert, Ross A.; Mobarry, Clark M.; Walenz, Brian P.
  • Journal of Computational Biology, Vol. 12, Issue 7
  • DOI: 10.1089/cmb.2005.12.943

Rpsc Reference Database For Xander
dataset, January 2018


Comparative evaluation of microbial profiles of oral samples obtained at different collection time points and using different methods
journal, September 2020

  • Omori, Michi; Kato-Kogoe, Nahoko; Sakaguchi, Shoichi
  • Clinical Oral Investigations, Vol. 25, Issue 5
  • DOI: 10.1007/s00784-020-03592-y

Genomic mapping by fingerprinting random clones: A mathematical analysis
journal, April 1988


High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing
journal, January 2021


Scaling laws predict global microbial diversity
journal, May 2016

  • Locey, Kenneth J.; Lennon, Jay T.
  • Proceedings of the National Academy of Sciences, Vol. 113, Issue 21
  • DOI: 10.1073/pnas.1521291113

UCHIME improves sensitivity and speed of chimera detection
journal, June 2011


IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth
journal, April 2012


MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
journal, January 2015


SFA-SPA: a suffix array based short peptide assembler for metagenomic data
journal, January 2015


Snowball : strain aware gene assembly of metagenomes
journal, September 2016


FragGeneScan: predicting genes in short and error-prone reads
journal, August 2010

  • Rho, Mina; Tang, Haixu; Ye, Yuzhen
  • Nucleic Acids Research, Vol. 38, Issue 20
  • DOI: 10.1093/nar/gkq747

Grinder: a versatile amplicon and shotgun sequence simulator
journal, March 2012

  • Angly, Florent E.; Willner, Dana; Rohwer, Forest
  • Nucleic Acids Research, Vol. 40, Issue 12
  • DOI: 10.1093/nar/gks251

MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads
journal, July 2012

  • Namiki, Toshiaki; Hachiya, Tsuyoshi; Tanaka, Hideaki
  • Nucleic Acids Research, Vol. 40, Issue 20
  • DOI: 10.1093/nar/gks678

The Pfam protein families database: towards a more sustainable future
journal, December 2015

  • Finn, Robert D.; Coggill, Penelope; Eberhardt, Ruth Y.
  • Nucleic Acids Research, Vol. 44, Issue D1
  • DOI: 10.1093/nar/gkv1344

ECOD: new developments in the evolutionary classification of domains
journal, November 2016

  • Schaeffer, R. Dustin; Liao, Yuxing; Cheng, Hua
  • Nucleic Acids Research, Vol. 45, Issue D1
  • DOI: 10.1093/nar/gkw1137

ROCker: accurate detection and quantification of target genes in short-read metagenomic data sets by modeling sliding-window bitscores
journal, October 2016

  • Orellana, Luis H.; Rodriguez-R, Luis M.; Konstantinidis, Konstantinos T.
  • Nucleic Acids Research
  • DOI: 10.1093/nar/gkw900

UCHIME2: improved chimera prediction for amplicon sequencing
posted_content, January 2016


ABySS: A parallel assembler for short read sequence data
journal, February 2009


De novo assembly of human genomes with massively parallel short read sequencing
journal, December 2009


Efficient de novo assembly of large genomes using compressed data structures
journal, December 2011


metaSPAdes: a new versatile metagenomic assembler
journal, March 2017

  • Nurk, Sergey; Meleshko, Dmitry; Korobeynikov, Anton
  • Genome Research, Vol. 27, Issue 5
  • DOI: 10.1101/gr.213959.116

CAP3: A DNA Sequence Assembly Program
journal, September 1999


Structure and function of the global ocean microbiome
journal, May 2015


Critical Evaluation of Two Primers Commonly Used for Amplification of Bacterial 16S rRNA Genes
journal, February 2008

  • Frank, J. A.; Reich, C. I.; Sharma, S.
  • Applied and Environmental Microbiology, Vol. 74, Issue 8
  • DOI: 10.1128/aem.02272-07

GRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly
journal, August 2016


MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs
journal, October 2017


Xander: employing a novel method for efficient gene-targeted metagenomic assembly
journal, August 2015


Finding the K Shortest Loopless Paths in a Network
journal, July 1971


A history of DNA sequence assembly
journal, January 2016


FunGene: the functional gene pipeline and repository
journal, January 2013


GenSeed-HMM: A Tool for Progressive Assembly Using Profile HMMs as Seeds and its Application in Alpavirinae Viral Discovery from Metagenomic Data
journal, March 2016

  • Alves, João M. P.; de Oliveira, André L.; Sandberg, Tatiana O. M.
  • Frontiers in Microbiology, Vol. 7
  • DOI: 10.3389/fmicb.2016.00269

Scaling metagenome sequence assembly with probabilistic de Bruijn graphs
text, January 2011


Rpsc Reference Database For Xander
dataset, January 2018