skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm

Journal Article · · Microbiome
 [1];  [2];  [3];  [4];  [5]
  1. Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Physical Biosciences Division
  2. Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); City College of San Francisco, CA (United States)
  3. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Genomics Division
  4. Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Sandia National Lab. (SNL-CA), Livermore, CA (United States). Biological and Materials Sciences Center
  5. Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Earth Sciences Division

Recovering individual genomes from metagenomic datasets allows access to uncultivated microbial populations that may have important roles in natural and engineered ecosystems. Understanding the roles of these uncultivated populations has broad application in ecology, evolution, biotechnology and medicine. Accurate binning of assembled metagenomic sequences is an essential step in recovering the genomes and understanding microbial functions. We have developed a binning algorithm, MaxBin, which automates the binning of assembled metagenomic scaffolds using an expectation-maximization algorithm after the assembly of metagenomic sequencing reads. Binning of simulated metagenomic datasets demonstrated that MaxBin had high levels of accuracy in binning microbial genomes. MaxBin was used to recover genomes from metagenomic data obtained through the Human Microbiome Project, which demonstrated its ability to recover genomes from real metagenomic datasets with variable sequencing coverages. Application of MaxBin to metagenomes obtained from microbial consortia adapted to grow on cellulose allowed genomic analysis of new, uncultivated, cellulolytic bacterial populations, including an abundant myxobacterial population distantly related to Sorangium cellulosum that possessed a much smaller genome (5 MB versus 13 to 14 MB) but has a more extensive set of genes for biomass deconstruction. For the cellulolytic consortia, the MaxBin results were compared to binning using emergent self-organizing maps (ESOMs) and differential coverage binning, demonstrating that it performed comparably to these methods but had distinct advantages in automation, resolution of related genomes and sensitivity. The automatic binning software that we developed successfully classifies assembled sequences in metagenomic datasets into recovered individual genomes. The isolation of dozens of species in cellulolytic microbial consortia, including a novel species of myxobacteria that has the smallest genome among all sequenced aerobic myxobacteria, was easily achieved using the binning software. This work demonstrates that the processes required for recovering genomes from assembled metagenomic datasets can be readily automated, an important advance in understanding the metabolic potential of microbes in natural environments. MaxBin is available at https://sourceforge.net/projects/maxbin/.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1511393
Journal Information:
Microbiome, Vol. 2; ISSN 2049-2618
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 359 works
Citation information provided by
Web of Science

References (55)

MetaCluster 4.0: A Novel Binning Algorithm for NGS Reads and Huge Number of Species journal February 2012
Separating metagenomic short reads into genomes via clustering journal September 2012
MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample journal September 2012
Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw journal November 2011
Complete genome sequence of the myxobacterium Sorangium cellulosum journal October 2007
KAAS: an automatic genome annotation and pathway reconstruction server journal May 2007
Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization journal August 2012
Fast and accurate short read alignment with Burrows-Wheeler transform journal May 2009
Fermentation, Hydrogen, and Sulfur Metabolism in Multiple Uncultivated Bacterial Phyla journal September 2012
Data, information, knowledge and principle: back to metabolism in KEGG journal November 2013
MetaSim—A Sequencing Simulator for Genomics and Metagenomics journal October 2008
Community-wide analysis of microbial genome sequence signatures journal January 2009
Community dynamics of cellulose-adapted thermophilic bacterial consortia: Cellulose-adapted consortia journal June 2013
Glycoside Hydrolase Activities of Thermophilic Bacterial Consortia Adapted to Switchgrass journal July 2011
Sequencing technologies — the next generation journal December 2009
Improvement of Phylogenies after Removing Divergent and Ambiguously Aligned Blocks from Protein Sequence Alignments journal August 2007
Discovery of Microorganisms and Enzymes Involved in High-Solids Decomposition of Rice Straw Using Metagenomic Analyses journal October 2013
MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads journal July 2012
Genomic signature: characterization and classification of species assessed by chaos game representation of sequences journal October 1999
Nucleotide, dinucleotide and trinucleotide frequencies explain patterns observed in chaos game representations of DNA sequences journal January 1993
dbCAN: a web resource for automated carbohydrate-active enzyme annotation journal May 2012
The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics journal January 2009
Expanded phylogeny of myxobacteria and evidence for cultivation of the ‘unculturables’ journal November 2010
Genes from Nine Genomes Are Separated into Their Organisms in the Dinucleotide Composition Space journal January 1998
Velvet: Algorithms for de novo short read assembly using de Bruijn graphs journal February 2008
Use of simulated data sets to evaluate the fidelity of metagenomic processing methods journal April 2007
CompostBin: A DNA Composition-Based Algorithm for Binning Environmental Shotgun Reads book January 2008
IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth journal April 2012
Tackling soil diversity with the assembly of large, complex metagenomes journal March 2014
Systematic Identification of Gene Families for Use as “Markers” for Phylogenetic and Phylogeny-Driven Ecological Studies of Bacteria and Archaea and Their Major Subgroups journal October 2013
Genovo: De Novo Assembly for Metagenomes journal March 2011
A Novel Abundance-Based Algorithm for Binning Metagenomic Sequences Using l -tuples journal March 2011
FragGeneScan: predicting genes in short and error-prone reads journal August 2010
Alignment-free Visualization of Metagenomic Data by Nonlinear Dimension Reduction journal March 2014
Accelerated Profile HMM Searches journal October 2011
Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes journal May 2013
The Mosaic Genome of Anaeromyxobacter dehalogenans Strain 2CP-C Suggests an Aerobic Common Ancestor to the Delta-Proteobacteria journal May 2008
eggNOG v4.0: nested orthology inference across 3686 organisms journal December 2013
Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota journal February 2012
Extraordinary expansion of a Sorangium cellulosum genome from an alkaline milieu journal July 2013
MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods journal May 2011
Community structure and metabolism through reconstruction of microbial genomes from the environment journal February 2004
The Complete Genome of Teredinibacter turnerae T7901: An Intracellular Endosymbiont of Marine Wood-Boring Bivalves (Shipworms) journal July 2009
Genomic mapping by fingerprinting random clones: A mathematical analysis journal April 1988
Evolutionary Implications of Microbial Genome Tetranucleotide Frequency Biases journal January 2003
Integrative analysis of environmental sequences using MEGAN4 journal June 2011
Meta-IDBA: a de Novo assembler for metagenomic data journal June 2011
MUSCLE: multiple sequence alignment with high accuracy and high throughput journal March 2004
Proteogenomic Analysis of a Thermophilic Bacterial Consortium Adapted to Deconstruct Switchgrass journal July 2013
Metagenomic Discovery of Biomass-Degrading Genes and Genomes from Cow Rumen journal January 2011
Targeted Discovery of Glycoside Hydrolases from a Switchgrass-Adapted Compost Community journal January 2010
Myxobacteria, producers of novel bioactive substances journal September 2001
Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineage journal December 2011
Expansion of the enzymatic repertoire of the CAZy database to integrate auxiliary redox enzymes journal January 2013
Separating Metagenomic Short Reads into Genomes via Clustering book January 2011

Cited By (11)

Refining the phylum Chlorobi by resolving the phylogeny and metabolic potential of the representative of a deeply branching, uncultivated lineage journal September 2015
Anaerobic degradation of hexadecane and phenanthrene coupled to sulfate reduction by enriched consortia from northern Gulf of Mexico seafloor sediment journal February 2019
Intensive allochthonous inputs along the Ganges River and their effect on microbial community composition and dynamics journal November 2018
IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses journal December 2016
VizBin - an application for reference-independent visualization and human-augmented binning of metagenomic data journal January 2015
Intermediate-Salinity Systems at High Altitudes in the Peruvian Andes Unveil a High Diversity and Abundance of Bacteria and Viruses journal November 2019
Metagenomic reconstructions of gut microbial metabolism in weanling pigs journal March 2019
Phylogenetic, genomic, and biogeographic characterization of a novel and ubiquitous marine invertebrate-associated Rickettsiales parasite, Candidatus Aquarickettsia rohweri, gen. nov., sp. nov journal August 2019
Accurate and complete genomes from metagenomes journal March 2020
KBase Narrative - Metagenome-assembled genomes from Amazonian soil microbial consortia dataset January 2022
Machine learning for metagenomics: methods and tools preprint January 2015

Figures / Tables (10)