DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm

Abstract

Recovering individual genomes from metagenomic datasets allows access to uncultivated microbial populations that may have important roles in natural and engineered ecosystems. Understanding the roles of these uncultivated populations has broad application in ecology, evolution, biotechnology and medicine. Accurate binning of assembled metagenomic sequences is an essential step in recovering the genomes and understanding microbial functions. We have developed a binning algorithm, MaxBin, which automates the binning of assembled metagenomic scaffolds using an expectation-maximization algorithm after the assembly of metagenomic sequencing reads. Binning of simulated metagenomic datasets demonstrated that MaxBin had high levels of accuracy in binning microbial genomes. MaxBin was used to recover genomes from metagenomic data obtained through the Human Microbiome Project, which demonstrated its ability to recover genomes from real metagenomic datasets with variable sequencing coverages. Application of MaxBin to metagenomes obtained from microbial consortia adapted to grow on cellulose allowed genomic analysis of new, uncultivated, cellulolytic bacterial populations, including an abundant myxobacterial population distantly related to Sorangium cellulosum that possessed a much smaller genome (5 MB versus 13 to 14 MB) but has a more extensive set of genes for biomass deconstruction. For the cellulolytic consortia, the MaxBin results were compared to binning using emergent self-organizingmore » maps (ESOMs) and differential coverage binning, demonstrating that it performed comparably to these methods but had distinct advantages in automation, resolution of related genomes and sensitivity. The automatic binning software that we developed successfully classifies assembled sequences in metagenomic datasets into recovered individual genomes. The isolation of dozens of species in cellulolytic microbial consortia, including a novel species of myxobacteria that has the smallest genome among all sequenced aerobic myxobacteria, was easily achieved using the binning software. This work demonstrates that the processes required for recovering genomes from assembled metagenomic datasets can be readily automated, an important advance in understanding the metabolic potential of microbes in natural environments. MaxBin is available at https://sourceforge.net/projects/maxbin/.« less

Authors:
 [1];  [2];  [3];  [4];  [5]
  1. Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Physical Biosciences Division
  2. Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); City College of San Francisco, CA (United States)
  3. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Genomics Division
  4. Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Sandia National Lab. (SNL-CA), Livermore, CA (United States). Biological and Materials Sciences Center
  5. Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Earth Sciences Division
Publication Date:
Research Org.:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
OSTI Identifier:
1511393
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
Microbiome
Additional Journal Information:
Journal Volume: 2; Journal ID: ISSN 2049-2618
Publisher:
BioMed Central
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING; binning; metagenomics; expectation-maximization algorithm

Citation Formats

Wu, Yu-Wei, Tang, Yung-Hsu, Tringe, Susannah G., Simmons, Blake A., and Singer, Steven W. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. United States: N. p., 2014. Web. doi:10.1186/2049-2618-2-26.
Wu, Yu-Wei, Tang, Yung-Hsu, Tringe, Susannah G., Simmons, Blake A., & Singer, Steven W. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. United States. https://doi.org/10.1186/2049-2618-2-26
Wu, Yu-Wei, Tang, Yung-Hsu, Tringe, Susannah G., Simmons, Blake A., and Singer, Steven W. Fri . "MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm". United States. https://doi.org/10.1186/2049-2618-2-26. https://www.osti.gov/servlets/purl/1511393.
@article{osti_1511393,
title = {MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm},
author = {Wu, Yu-Wei and Tang, Yung-Hsu and Tringe, Susannah G. and Simmons, Blake A. and Singer, Steven W.},
abstractNote = {Recovering individual genomes from metagenomic datasets allows access to uncultivated microbial populations that may have important roles in natural and engineered ecosystems. Understanding the roles of these uncultivated populations has broad application in ecology, evolution, biotechnology and medicine. Accurate binning of assembled metagenomic sequences is an essential step in recovering the genomes and understanding microbial functions. We have developed a binning algorithm, MaxBin, which automates the binning of assembled metagenomic scaffolds using an expectation-maximization algorithm after the assembly of metagenomic sequencing reads. Binning of simulated metagenomic datasets demonstrated that MaxBin had high levels of accuracy in binning microbial genomes. MaxBin was used to recover genomes from metagenomic data obtained through the Human Microbiome Project, which demonstrated its ability to recover genomes from real metagenomic datasets with variable sequencing coverages. Application of MaxBin to metagenomes obtained from microbial consortia adapted to grow on cellulose allowed genomic analysis of new, uncultivated, cellulolytic bacterial populations, including an abundant myxobacterial population distantly related to Sorangium cellulosum that possessed a much smaller genome (5 MB versus 13 to 14 MB) but has a more extensive set of genes for biomass deconstruction. For the cellulolytic consortia, the MaxBin results were compared to binning using emergent self-organizing maps (ESOMs) and differential coverage binning, demonstrating that it performed comparably to these methods but had distinct advantages in automation, resolution of related genomes and sensitivity. The automatic binning software that we developed successfully classifies assembled sequences in metagenomic datasets into recovered individual genomes. The isolation of dozens of species in cellulolytic microbial consortia, including a novel species of myxobacteria that has the smallest genome among all sequenced aerobic myxobacteria, was easily achieved using the binning software. This work demonstrates that the processes required for recovering genomes from assembled metagenomic datasets can be readily automated, an important advance in understanding the metabolic potential of microbes in natural environments. MaxBin is available at https://sourceforge.net/projects/maxbin/.},
doi = {10.1186/2049-2618-2-26},
journal = {Microbiome},
number = ,
volume = 2,
place = {United States},
year = {Fri Aug 01 00:00:00 EDT 2014},
month = {Fri Aug 01 00:00:00 EDT 2014}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 359 works
Citation information provided by
Web of Science

Figures / Tables:

Figure 1 Figure 1: The general workflow of MaxBin. Tetranucleotude frequencies, scaffold coverage levels, and single-copy marker genes are collected from metagenomic scaffolds. The collected information is computed by an expectation-maximization algorithm to bin sequences.

Save / Share:

Works referenced in this record:

MetaCluster 4.0: A Novel Binning Algorithm for NGS Reads and Huge Number of Species
journal, February 2012

  • Wang, Yi; Leung, Henry C. M.; Yiu, S. M.
  • Journal of Computational Biology, Vol. 19, Issue 2
  • DOI: 10.1089/cmb.2011.0276

Separating metagenomic short reads into genomes via clustering
journal, September 2012

  • Tanaseichuk, Olga; Borneman, James; Jiang, Tao
  • Algorithms for Molecular Biology, Vol. 7, Issue 1
  • DOI: 10.1186/1748-7188-7-27

MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample
journal, September 2012


Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw
journal, November 2011

  • Mackelprang, Rachel; Waldrop, Mark P.; DeAngelis, Kristen M.
  • Nature, Vol. 480, Issue 7377
  • DOI: 10.1038/nature10576

Complete genome sequence of the myxobacterium Sorangium cellulosum
journal, October 2007

  • Schneiker, Susanne; Perlova, Olena; Kaiser, Olaf
  • Nature Biotechnology, Vol. 25, Issue 11
  • DOI: 10.1038/nbt1354

KAAS: an automatic genome annotation and pathway reconstruction server
journal, May 2007

  • Moriya, Y.; Itoh, M.; Okuda, S.
  • Nucleic Acids Research, Vol. 35, Issue S2, p. W182-W185
  • DOI: 10.1093/nar/gkm321

Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization
journal, August 2012

  • Sharon, I.; Morowitz, M. J.; Thomas, B. C.
  • Genome Research, Vol. 23, Issue 1
  • DOI: 10.1101/gr.142315.112

Fast and accurate short read alignment with Burrows-Wheeler transform
journal, May 2009


Fermentation, Hydrogen, and Sulfur Metabolism in Multiple Uncultivated Bacterial Phyla
journal, September 2012


Data, information, knowledge and principle: back to metabolism in KEGG
journal, November 2013

  • Kanehisa, Minoru; Goto, Susumu; Sato, Yoko
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1076

MetaSim—A Sequencing Simulator for Genomics and Metagenomics
journal, October 2008


Community-wide analysis of microbial genome sequence signatures
journal, January 2009

  • Dick, Gregory J.; Andersson, Anders F.; Baker, Brett J.
  • Genome Biology, Vol. 10, Issue 8
  • DOI: 10.1186/gb-2009-10-8-r85

Community dynamics of cellulose-adapted thermophilic bacterial consortia: Cellulose-adapted consortia
journal, June 2013

  • Eichorst, Stephanie A.; Varanasi, Patanjali; Stavila, Vatalie
  • Environmental Microbiology, Vol. 15, Issue 9
  • DOI: 10.1111/1462-2920.12159

Glycoside Hydrolase Activities of Thermophilic Bacterial Consortia Adapted to Switchgrass
journal, July 2011

  • Gladden, John M.; Allgaier, Martin; Miller, Christopher S.
  • Applied and Environmental Microbiology, Vol. 77, Issue 16, p. 5804-5812
  • DOI: 10.1128/AEM.00032-11

Sequencing technologies — the next generation
journal, December 2009

  • Metzker, Michael L.
  • Nature Reviews Genetics, Vol. 11, Issue 1
  • DOI: 10.1038/nrg2626

Improvement of Phylogenies after Removing Divergent and Ambiguously Aligned Blocks from Protein Sequence Alignments
journal, August 2007


Discovery of Microorganisms and Enzymes Involved in High-Solids Decomposition of Rice Straw Using Metagenomic Analyses
journal, October 2013


MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads
journal, July 2012

  • Namiki, Toshiaki; Hachiya, Tsuyoshi; Tanaka, Hideaki
  • Nucleic Acids Research, Vol. 40, Issue 20
  • DOI: 10.1093/nar/gks678

Genomic signature: characterization and classification of species assessed by chaos game representation of sequences
journal, October 1999


dbCAN: a web resource for automated carbohydrate-active enzyme annotation
journal, May 2012

  • Yin, Yanbin; Mao, Xizeng; Yang, Jincai
  • Nucleic Acids Research, Vol. 40, Issue W1
  • DOI: 10.1093/nar/gks479

The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics
journal, January 2009

  • Cantarel, B. L.; Coutinho, P. M.; Rancurel, C.
  • Nucleic Acids Research, Vol. 37, Issue Database
  • DOI: 10.1093/nar/gkn663

Expanded phylogeny of myxobacteria and evidence for cultivation of the ‘unculturables’
journal, November 2010


Velvet: Algorithms for de novo short read assembly using de Bruijn graphs
journal, February 2008


Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
journal, April 2007

  • Mavromatis, Konstantinos; Ivanova, Natalia; Barry, Kerrie
  • Nature Methods, Vol. 4, Issue 6
  • DOI: 10.1038/nmeth1043

CompostBin: A DNA Composition-Based Algorithm for Binning Environmental Shotgun Reads
book, January 2008


IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth
journal, April 2012


Tackling soil diversity with the assembly of large, complex metagenomes
journal, March 2014

  • Howe, Adina Chuang; Jansson, Janet K.; Malfatti, Stephanie A.
  • Proceedings of the National Academy of Sciences, Vol. 111, Issue 13
  • DOI: 10.1073/pnas.1402564111

Genovo: De Novo Assembly for Metagenomes
journal, March 2011

  • Laserson, Jonathan; Jojic, Vladimir; Koller, Daphne
  • Journal of Computational Biology, Vol. 18, Issue 3
  • DOI: 10.1089/cmb.2010.0244

A Novel Abundance-Based Algorithm for Binning Metagenomic Sequences Using l -tuples
journal, March 2011


FragGeneScan: predicting genes in short and error-prone reads
journal, August 2010

  • Rho, Mina; Tang, Haixu; Ye, Yuzhen
  • Nucleic Acids Research, Vol. 38, Issue 20
  • DOI: 10.1093/nar/gkq747

Alignment-free Visualization of Metagenomic Data by Nonlinear Dimension Reduction
journal, March 2014

  • Laczny, Cedric C.; Pinel, Nicolás; Vlassis, Nikos
  • Scientific Reports, Vol. 4, Issue 1
  • DOI: 10.1038/srep04516

Accelerated Profile HMM Searches
journal, October 2011


Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes
journal, May 2013

  • Albertsen, Mads; Hugenholtz, Philip; Skarshewski, Adam
  • Nature Biotechnology, Vol. 31, Issue 6
  • DOI: 10.1038/nbt.2579

The Mosaic Genome of Anaeromyxobacter dehalogenans Strain 2CP-C Suggests an Aerobic Common Ancestor to the Delta-Proteobacteria
journal, May 2008


eggNOG v4.0: nested orthology inference across 3686 organisms
journal, December 2013

  • Powell, Sean; Forslund, Kristoffer; Szklarczyk, Damian
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1253

Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota
journal, February 2012


Extraordinary expansion of a Sorangium cellulosum genome from an alkaline milieu
journal, July 2013

  • Han, Kui; Li, Zhi-feng; Peng, Ran
  • Scientific Reports, Vol. 3, Issue 1
  • DOI: 10.1038/srep02101

MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods
journal, May 2011

  • Tamura, K.; Peterson, D.; Peterson, N.
  • Molecular Biology and Evolution, Vol. 28, Issue 10
  • DOI: 10.1093/molbev/msr121

Community structure and metabolism through reconstruction of microbial genomes from the environment
journal, February 2004

  • Tyson, Gene W.; Chapman, Jarrod; Hugenholtz, Philip
  • Nature, Vol. 428, Issue 6978
  • DOI: 10.1038/nature02340

The Complete Genome of Teredinibacter turnerae T7901: An Intracellular Endosymbiont of Marine Wood-Boring Bivalves (Shipworms)
journal, July 2009


Genomic mapping by fingerprinting random clones: A mathematical analysis
journal, April 1988


Evolutionary Implications of Microbial Genome Tetranucleotide Frequency Biases
journal, January 2003


Integrative analysis of environmental sequences using MEGAN4
journal, June 2011

  • Huson, D. H.; Mitra, S.; Ruscheweyh, H. -J.
  • Genome Research, Vol. 21, Issue 9
  • DOI: 10.1101/gr.120618.111

Meta-IDBA: a de Novo assembler for metagenomic data
journal, June 2011


MUSCLE: multiple sequence alignment with high accuracy and high throughput
journal, March 2004

  • Edgar, R. C.
  • Nucleic Acids Research, Vol. 32, Issue 5, p. 1792-1797
  • DOI: 10.1093/nar/gkh340

Proteogenomic Analysis of a Thermophilic Bacterial Consortium Adapted to Deconstruct Switchgrass
journal, July 2013


Metagenomic Discovery of Biomass-Degrading Genes and Genomes from Cow Rumen
journal, January 2011


Targeted Discovery of Glycoside Hydrolases from a Switchgrass-Adapted Compost Community
journal, January 2010


Myxobacteria, producers of novel bioactive substances
journal, September 2001

  • Reichenbach, H.
  • Journal of Industrial Microbiology and Biotechnology, Vol. 27, Issue 3
  • DOI: 10.1038/sj.jim.7000025

Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineage
journal, December 2011

  • Dupont, Chris L.; Rusch, Douglas B.; Yooseph, Shibu
  • The ISME Journal, Vol. 6, Issue 6
  • DOI: 10.1038/ismej.2011.189

Expansion of the enzymatic repertoire of the CAZy database to integrate auxiliary redox enzymes
journal, January 2013

  • Levasseur, Anthony; Drula, Elodie; Lombard, Vincent
  • Biotechnology for Biofuels, Vol. 6, Issue 1, Article No. 41
  • DOI: 10.1186/1754-6834-6-41

Separating Metagenomic Short Reads into Genomes via Clustering
book, January 2011


Works referencing / citing this record:

Refining the phylum Chlorobi by resolving the phylogeny and metabolic potential of the representative of a deeply branching, uncultivated lineage
journal, September 2015

  • Hiras, Jennifer; Wu, Yu-Wei; Eichorst, Stephanie A.
  • The ISME Journal, Vol. 10, Issue 4
  • DOI: 10.1038/ismej.2015.158

Intensive allochthonous inputs along the Ganges River and their effect on microbial community composition and dynamics
journal, November 2018

  • Zhang, Si‐Yu; Tsementzi, Despina; Hatt, Janet K.
  • Environmental Microbiology, Vol. 21, Issue 1
  • DOI: 10.1111/1462-2920.14439

IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses
journal, December 2016


VizBin - an application for reference-independent visualization and human-augmented binning of metagenomic data
journal, January 2015


Intermediate-Salinity Systems at High Altitudes in the Peruvian Andes Unveil a High Diversity and Abundance of Bacteria and Viruses
journal, November 2019

  • Castelán-Sánchez, Hugo Gildardo; Elorrieta, Paola; Romoacca, Pedro
  • Genes, Vol. 10, Issue 11
  • DOI: 10.3390/genes10110891

Metagenomic reconstructions of gut microbial metabolism in weanling pigs
journal, March 2019


Accurate and complete genomes from metagenomes
journal, March 2020

  • Chen, Lin-Xing; Anantharaman, Karthik; Shaiber, Alon
  • Genome Research, Vol. 30, Issue 3
  • DOI: 10.1101/gr.258640.119

KBase Narrative - Metagenome-assembled genomes from Amazonian soil microbial consortia
dataset, January 2022


Machine learning for metagenomics: methods and tools
preprint, January 2015