skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Recovering complete and draft population genomes from metagenome datasets

Abstract

Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.

Authors:
ORCiD logo [1];  [2];  [3]
  1. Argonne National Lab. (ANL), Argonne, IL (United States); Univ. of Chicago, IL (United States)
  2. Argonne National Lab. (ANL), Argonne, IL (United States)
  3. Argonne National Lab. (ANL), Argonne, IL (United States); Univ. of Chicago, IL (United States); Marine Biological Lab., Woods Hole, MA (United States)
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1258642
Grant/Contract Number:  
AC02-06CH11357
Resource Type:
Accepted Manuscript
Journal Name:
Microbiome
Additional Journal Information:
Journal Volume: 4; Journal Issue: 1; Journal ID: ISSN 2049-2618
Publisher:
BioMed Central
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; Metagenomics; Genotype; Assembly; Binning; Curation

Citation Formats

Sangwan, Naseer, Xia, Fangfang, and Gilbert, Jack A. Recovering complete and draft population genomes from metagenome datasets. United States: N. p., 2016. Web. doi:10.1186/s40168-016-0154-5.
Sangwan, Naseer, Xia, Fangfang, & Gilbert, Jack A. Recovering complete and draft population genomes from metagenome datasets. United States. doi:10.1186/s40168-016-0154-5.
Sangwan, Naseer, Xia, Fangfang, and Gilbert, Jack A. Tue . "Recovering complete and draft population genomes from metagenome datasets". United States. doi:10.1186/s40168-016-0154-5. https://www.osti.gov/servlets/purl/1258642.
@article{osti_1258642,
title = {Recovering complete and draft population genomes from metagenome datasets},
author = {Sangwan, Naseer and Xia, Fangfang and Gilbert, Jack A.},
abstractNote = {Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.},
doi = {10.1186/s40168-016-0154-5},
journal = {Microbiome},
number = 1,
volume = 4,
place = {United States},
year = {2016},
month = {3}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 33 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Biogeography: An Emerging Cornerstone for Understanding Prokaryotic Diversity, Ecology, and Evolution
journal, November 2006


Community structure and metabolism through reconstruction of microbial genomes from the environment
journal, February 2004

  • Tyson, Gene W.; Chapman, Jarrod; Hugenholtz, Philip
  • Nature, Vol. 428, Issue 6978
  • DOI: 10.1038/nature02340

Ecological roles of dominant and rare prokaryotes in acid mine drainage revealed by metagenomics and metatranscriptomics
journal, November 2014

  • Hua, Zheng-Shuang; Han, Yu-Jiao; Chen, Lin-Xing
  • The ISME Journal, Vol. 9, Issue 6
  • DOI: 10.1038/ismej.2014.212

Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization
journal, August 2012

  • Sharon, I.; Morowitz, M. J.; Thomas, B. C.
  • Genome Research, Vol. 23, Issue 1
  • DOI: 10.1101/gr.142315.112

Metagenomic Discovery of Biomass-Degrading Genes and Genomes from Cow Rumen
journal, January 2011


Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota
journal, February 2012


Small Genomes and Sparse Metabolisms of Sediment-Associated Bacteria from Four Candidate Phyla
journal, October 2013


Strain recovery from metagenomes
journal, October 2015


Strain-resolved community genomic analysis of gut microbial colonization in a premature infant
journal, December 2010

  • Morowitz, M. J.; Denef, V. J.; Costello, E. K.
  • Proceedings of the National Academy of Sciences, Vol. 108, Issue 3
  • DOI: 10.1073/pnas.1010992108

Niche and host-associated functional signatures of the root surface microbiome
journal, September 2014

  • Ofek-Lalzar, Maya; Sela, Noa; Goldman-Voronov, Milana
  • Nature Communications, Vol. 5, Issue 1
  • DOI: 10.1038/ncomms5950

Microbial Metagenomics: Beyond the Genome
journal, January 2011


The complete genome sequence for putative H 2 - and S-oxidizer C andidatus Sulfuricurvum sp., assembled de novo from an aquifer-derived metagenome : Complete genome of
journal, April 2014

  • Handley, Kim M.; Bartels, Daniela; O'Loughlin, Edward J.
  • Environmental Microbiology, Vol. 16, Issue 11
  • DOI: 10.1111/1462-2920.12453

Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw
journal, November 2011

  • Mackelprang, Rachel; Waldrop, Mark P.; DeAngelis, Kristen M.
  • Nature, Vol. 480, Issue 7377
  • DOI: 10.1038/nature10576

Fermentation, Hydrogen, and Sulfur Metabolism in Multiple Uncultivated Bacterial Phyla
journal, September 2012


Extraordinary phylogenetic diversity and metabolic versatility in aquifer sediment
journal, August 2013

  • Castelle, Cindy J.; Hug, Laura A.; Wrighton, Kelly C.
  • Nature Communications, Vol. 4, Issue 1
  • DOI: 10.1038/ncomms3120

Arsenic rich Himalayan hot spring metagenomics reveal genetically novel predator-prey genotypes: Metagenomic recovery of predator prey genotypes
journal, July 2015

  • Sangwan, Naseer; Lambert, Carey; Sharma, Anukriti
  • Environmental Microbiology Reports, Vol. 7, Issue 6
  • DOI: 10.1111/1758-2229.12297

Whole-Genome Draft Sequences of 26 Enterohemorrhagic Escherichia coli O157:H7 Strains
journal, February 2013


Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data
journal, February 2012


Repetitive DNA and next-generation sequencing: computational challenges and solutions
journal, November 2011

  • Treangen, Todd J.; Salzberg, Steven L.
  • Nature Reviews Genetics, Vol. 13, Issue 1
  • DOI: 10.1038/nrg3117

Comparison of different assembly and annotation tools on analysis of simulated viral metagenomic communities in the gut
journal, January 2014

  • Vázquez-Castellanos, Jorge F.; García-López, Rodrigo; Pérez-Brocal, Vicente
  • BMC Genomics, Vol. 15, Issue 1
  • DOI: 10.1186/1471-2164-15-37

MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads
journal, July 2012

  • Namiki, Toshiaki; Hachiya, Tsuyoshi; Tanaka, Hideaki
  • Nucleic Acids Research, Vol. 40, Issue 20
  • DOI: 10.1093/nar/gks678

SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
journal, May 2012

  • Bankevich, Anton; Nurk, Sergey; Antipov, Dmitry
  • Journal of Computational Biology, Vol. 19, Issue 5
  • DOI: 10.1089/cmb.2012.0021

Ray Meta: scalable de novo metagenome assembly and profiling
journal, January 2012

  • Boisvert, Sébastien; Raymond, Frédéric; Godzaridis, Élénie
  • Genome Biology, Vol. 13, Issue 12
  • DOI: 10.1186/gb-2012-13-12-r122

IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth
journal, April 2012


MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
journal, January 2015


Scaling metagenome sequence assembly with probabilistic de Bruijn graphs
journal, July 2012

  • Pell, J.; Hintze, A.; Canino-Koning, R.
  • Proceedings of the National Academy of Sciences, Vol. 109, Issue 33
  • DOI: 10.1073/pnas.1121464109

Improved Assemblies Using a Source-Agnostic Pipeline for MetaGenomic Assembly by Merging (MeGAMerge) of Contigs
journal, October 2014

  • Scholz, Matthew; Lo, Chien-Chi; Chain, Patrick S. G.
  • Scientific Reports, Vol. 4, Issue 1
  • DOI: 10.1038/srep06480

Using cascading Bloom filters to improve the memory usage for de Brujin graphs
journal, January 2014

  • Salikhov, Kamil; Sacomoto, Gustavo; Kucherov, Gregory
  • Algorithms for Molecular Biology, Vol. 9, Issue 1
  • DOI: 10.1186/1748-7188-9-2

ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies
journal, January 2013


REAPR: a universal tool for genome assembly evaluation
journal, January 2013


Individual genome assembly from complex community short-read metagenomic datasets
journal, October 2011

  • Luo, Chengwei; Tsementzi, Despina; Kyrpides, Nikos C.
  • The ISME Journal, Vol. 6, Issue 4
  • DOI: 10.1038/ismej.2011.147

Minimus: a fast, lightweight genome assembler
journal, January 2007

  • Sommer, Daniel D.; Delcher, Arthur L.; Salzberg, Steven L.
  • BMC Bioinformatics, Vol. 8, Issue 1
  • DOI: 10.1186/1471-2105-8-64

Velvet: Algorithms for de novo short read assembly using de Bruijn graphs
journal, February 2008


SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler
journal, December 2012


Ray: Simultaneous Assembly of Reads from a Mix of High-Throughput Sequencing Technologies
journal, November 2010

  • Boisvert, Sébastien; Laviolette, François; Corbeil, Jacques
  • Journal of Computational Biology, Vol. 17, Issue 11
  • DOI: 10.1089/cmb.2009.0238

An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data
journal, January 2015

  • Deng, Xutao; Naccache, Samia N.; Ng, Terry
  • Nucleic Acids Research, Vol. 43, Issue 7
  • DOI: 10.1093/nar/gkv002

ABySS: A parallel assembler for short read sequence data
journal, February 2009


CAP3: A DNA Sequence Assembly Program
journal, September 1999


Evaluating the Fidelity of De Novo Short Read Metagenomic Assembly Using Simulated Data
journal, May 2011


Evaluation of short read metagenomic assembly
journal, January 2011


Nonpareil: a redundancy-based approach to assess the level of coverage in metagenomic datasets
journal, October 2013


A General Coverage Theory for Shotgun DNA Sequencing
journal, July 2006


A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea
journal, December 2009

  • Wu, Dongying; Hugenholtz, Philip; Mavromatis, Konstantinos
  • Nature, Vol. 462, Issue 7276
  • DOI: 10.1038/nature08656

Estimating coverage in metagenomic data sets and why it matters
journal, May 2014

  • Rodriguez-R, Luis M.; Konstantinidis, Konstantinos T.
  • The ISME Journal, Vol. 8, Issue 11
  • DOI: 10.1038/ismej.2014.76

Key roles for freshwater Actinobacteria revealed by deep metagenomic sequencing
journal, November 2014

  • Ghai, Rohit; Mizuno, Carolina Megumi; Picazo, Antonio
  • Molecular Ecology, Vol. 23, Issue 24
  • DOI: 10.1111/mec.12985

Ecological Succession and Viability of Human-Associated Microbiota on Restroom Surfaces
journal, November 2014

  • Gibbons, Sean M.; Schwartz, Tara; Fouquier, Jennifer
  • Applied and Environmental Microbiology, Vol. 81, Issue 2
  • DOI: 10.1128/AEM.03117-14

Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes
journal, May 2013

  • Albertsen, Mads; Hugenholtz, Philip; Skarshewski, Adam
  • Nature Biotechnology, Vol. 31, Issue 6
  • DOI: 10.1038/nbt.2579

Reevaluating Assembly Evaluations with Feature Response Curves: GAGE and Assemblathons
journal, December 2012


Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes
journal, July 2014

  • Nielsen, H. Bjørn; Almeida, Mathieu; Juncker, Agnieszka Sierakowska
  • Nature Biotechnology, Vol. 32, Issue 8
  • DOI: 10.1038/nbt.2939

A metagenome-wide association study of gut microbiota in type 2 diabetes
journal, September 2012


Genome Project Standards in a New Era of Sequencing
journal, October 2009


MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities
journal, January 2015


Binning metagenomic contigs by coverage and composition
journal, September 2014

  • Alneberg, Johannes; Bjarnason, Brynjar Smári; de Bruijn, Ino
  • Nature Methods, Vol. 11, Issue 11
  • DOI: 10.1038/nmeth.3103

MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets
journal, October 2015


GroopM: an automated tool for the recovery of population genomes from related metagenomes
journal, January 2014

  • Imelfort, Michael; Parks, Donovan; Woodcroft, Ben J.
  • PeerJ, Vol. 2
  • DOI: 10.7717/peerj.603

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species
journal, July 2013

  • Bradnam, Keith R.; Fass, Joseph N.; Alexandrov, Anton
  • GigaScience, Vol. 2, Issue 1
  • DOI: 10.1186/2047-217X-2-10

Alignathon: a competitive assessment of whole-genome alignment methods
journal, October 2014


GAGE: A critical evaluation of genome assemblies and assembly algorithms
journal, January 2012

  • Salzberg, S. L.; Phillippy, A. M.; Zimin, A.
  • Genome Research, Vol. 22, Issue 3
  • DOI: 10.1101/gr.131383.111

Automated ensemble assembly and validation of microbial genomes
journal, May 2014

  • Koren, Sergey; Treangen, Todd J.; Hill, Christopher M.
  • BMC Bioinformatics, Vol. 15, Issue 1
  • DOI: 10.1186/1471-2105-15-126

GAM-NGS: genomic assemblies merger for next generation sequencing
journal, April 2013


RAIphy: Phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles
journal, January 2011

  • Nalbantoglu, Ozkan U.; Way, Samuel F.; Hinrichs, Steven H.
  • BMC Bioinformatics, Vol. 12, Issue 1
  • DOI: 10.1186/1471-2105-12-41

Applying Shannon's information theory to bacterial and phage genomes and metagenomes
journal, January 2013

  • Akhter, Sajia; Bailey, Barbara A.; Salamon, Peter
  • Scientific Reports, Vol. 3, Issue 1
  • DOI: 10.1038/srep01033

Unusual biology across a group comprising more than 15% of domain Bacteria
journal, June 2015

  • Brown, Christopher T.; Hug, Laura A.; Thomas, Brian C.
  • Nature, Vol. 523, Issue 7559
  • DOI: 10.1038/nature14486

The human gut and groundwater harbor non-photosynthetic bacteria belonging to a new candidate phylum sibling to Cyanobacteria
journal, October 2013


Genomic resolution of linkages in carbon, nitrogen, and sulfur cycling among widespread estuary sediment bacteria
journal, April 2015


A simple, fast, and accurate method of phylogenomic inference
journal, January 2008


Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2
journal, February 2012


The Pfam protein families database
journal, November 2011

  • Punta, M.; Coggill, P. C.; Eberhardt, R. Y.
  • Nucleic Acids Research, Vol. 40, Issue D1
  • DOI: 10.1093/nar/gkr1065

The TIGRFAMs database of protein families
journal, January 2003


Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineage
journal, December 2011

  • Dupont, Chris L.; Rusch, Douglas B.; Yooseph, Shibu
  • The ISME Journal, Vol. 6, Issue 6
  • DOI: 10.1038/ismej.2011.189

The comprehensive microbial resource
journal, November 2009

  • Davidsen, Tanja; Beck, Erin; Ganapathy, Anuradha
  • Nucleic Acids Research, Vol. 38, Issue suppl_1
  • DOI: 10.1093/nar/gkp912

CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes
journal, May 2015

  • Parks, Donovan H.; Imelfort, Michael; Skennerton, Connor T.
  • Genome Research, Vol. 25, Issue 7
  • DOI: 10.1101/gr.186072.114

BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs
journal, June 2015


Beginner’s guide to comparative bacterial genome analysis using next-generation sequence data
journal, April 2013

  • Edwards, David J.; Holt, Kathryn E.
  • Microbial Informatics and Experimentation, Vol. 3, Issue 1
  • DOI: 10.1186/2042-5783-3-2

Environmental shaping of codon usage and functional adaptation across microbial communities
journal, August 2013

  • Roller, Maša; Lucić, Vedran; Nagy, István
  • Nucleic Acids Research, Vol. 41, Issue 19
  • DOI: 10.1093/nar/gkt673

Variation in global codon usage bias among prokaryotic organisms is associated with their lifestyles
journal, January 2011


Genome sequence of Staphylococcus lugdunensis N920143 allows identification of putative colonization and virulence factors: Staphylococcus lugdunensis genome sequence
journal, July 2011


Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance
journal, January 2012


Pvclust: an R package for assessing the uncertainty in hierarchical clustering
journal, April 2006


Community-wide analysis of microbial genome sequence signatures
journal, January 2009

  • Dick, Gregory J.; Andersson, Anders F.; Baker, Brett J.
  • Genome Biology, Vol. 10, Issue 8
  • DOI: 10.1186/gb-2009-10-8-r85

Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition
journal, December 2011

  • Saeed, Isaam; Tang, Sen-Lin; Halgamuge, Saman K.
  • Nucleic Acids Research, Vol. 40, Issue 5
  • DOI: 10.1093/nar/gkr1204

MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm
journal, August 2014


    Works referencing / citing this record:

    CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision
    journal, December 2017


    Rhizosphere microbiome structure alters to enable wilt resistance in tomato
    journal, October 2018

    • Kwak, Min-Jung; Kong, Hyun Gi; Choi, Kihyuck
    • Nature Biotechnology, Vol. 36, Issue 11
    • DOI: 10.1038/nbt.4232

    Rhizosphere microbiome structure alters to enable wilt resistance in tomato
    journal, October 2018

    • Kwak, Min-Jung; Kong, Hyun Gi; Choi, Kihyuck
    • Nature Biotechnology, Vol. 36, Issue 11
    • DOI: 10.1038/nbt.4232

    CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision
    journal, December 2017