MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm
Abstract
Recovering individual genomes from metagenomic datasets allows access to uncultivated microbial populations that may have important roles in natural and engineered ecosystems. Understanding the roles of these uncultivated populations has broad application in ecology, evolution, biotechnology and medicine. Accurate binning of assembled metagenomic sequences is an essential step in recovering the genomes and understanding microbial functions. We have developed a binning algorithm, MaxBin, which automates the binning of assembled metagenomic scaffolds using an expectation-maximization algorithm after the assembly of metagenomic sequencing reads. Binning of simulated metagenomic datasets demonstrated that MaxBin had high levels of accuracy in binning microbial genomes. MaxBin was used to recover genomes from metagenomic data obtained through the Human Microbiome Project, which demonstrated its ability to recover genomes from real metagenomic datasets with variable sequencing coverages. Application of MaxBin to metagenomes obtained from microbial consortia adapted to grow on cellulose allowed genomic analysis of new, uncultivated, cellulolytic bacterial populations, including an abundant myxobacterial population distantly related to Sorangium cellulosum that possessed a much smaller genome (5 MB versus 13 to 14 MB) but has a more extensive set of genes for biomass deconstruction. For the cellulolytic consortia, the MaxBin results were compared to binning using emergent self-organizingmore »
- Authors:
-
- Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Physical Biosciences Division
- Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); City College of San Francisco, CA (United States)
- USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Genomics Division
- Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Sandia National Lab. (SNL-CA), Livermore, CA (United States). Biological and Materials Sciences Center
- Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Earth Sciences Division
- Publication Date:
- Research Org.:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Biological and Environmental Research (BER)
- OSTI Identifier:
- 1511393
- Grant/Contract Number:
- AC02-05CH11231
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Microbiome
- Additional Journal Information:
- Journal Volume: 2; Journal ID: ISSN 2049-2618
- Publisher:
- BioMed Central
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING; binning; metagenomics; expectation-maximization algorithm
Citation Formats
Wu, Yu-Wei, Tang, Yung-Hsu, Tringe, Susannah G., Simmons, Blake A., and Singer, Steven W. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. United States: N. p., 2014.
Web. doi:10.1186/2049-2618-2-26.
Wu, Yu-Wei, Tang, Yung-Hsu, Tringe, Susannah G., Simmons, Blake A., & Singer, Steven W. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. United States. https://doi.org/10.1186/2049-2618-2-26
Wu, Yu-Wei, Tang, Yung-Hsu, Tringe, Susannah G., Simmons, Blake A., and Singer, Steven W. Fri .
"MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm". United States. https://doi.org/10.1186/2049-2618-2-26. https://www.osti.gov/servlets/purl/1511393.
@article{osti_1511393,
title = {MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm},
author = {Wu, Yu-Wei and Tang, Yung-Hsu and Tringe, Susannah G. and Simmons, Blake A. and Singer, Steven W.},
abstractNote = {Recovering individual genomes from metagenomic datasets allows access to uncultivated microbial populations that may have important roles in natural and engineered ecosystems. Understanding the roles of these uncultivated populations has broad application in ecology, evolution, biotechnology and medicine. Accurate binning of assembled metagenomic sequences is an essential step in recovering the genomes and understanding microbial functions. We have developed a binning algorithm, MaxBin, which automates the binning of assembled metagenomic scaffolds using an expectation-maximization algorithm after the assembly of metagenomic sequencing reads. Binning of simulated metagenomic datasets demonstrated that MaxBin had high levels of accuracy in binning microbial genomes. MaxBin was used to recover genomes from metagenomic data obtained through the Human Microbiome Project, which demonstrated its ability to recover genomes from real metagenomic datasets with variable sequencing coverages. Application of MaxBin to metagenomes obtained from microbial consortia adapted to grow on cellulose allowed genomic analysis of new, uncultivated, cellulolytic bacterial populations, including an abundant myxobacterial population distantly related to Sorangium cellulosum that possessed a much smaller genome (5 MB versus 13 to 14 MB) but has a more extensive set of genes for biomass deconstruction. For the cellulolytic consortia, the MaxBin results were compared to binning using emergent self-organizing maps (ESOMs) and differential coverage binning, demonstrating that it performed comparably to these methods but had distinct advantages in automation, resolution of related genomes and sensitivity. The automatic binning software that we developed successfully classifies assembled sequences in metagenomic datasets into recovered individual genomes. The isolation of dozens of species in cellulolytic microbial consortia, including a novel species of myxobacteria that has the smallest genome among all sequenced aerobic myxobacteria, was easily achieved using the binning software. This work demonstrates that the processes required for recovering genomes from assembled metagenomic datasets can be readily automated, an important advance in understanding the metabolic potential of microbes in natural environments. MaxBin is available at https://sourceforge.net/projects/maxbin/.},
doi = {10.1186/2049-2618-2-26},
journal = {Microbiome},
number = ,
volume = 2,
place = {United States},
year = {Fri Aug 01 00:00:00 EDT 2014},
month = {Fri Aug 01 00:00:00 EDT 2014}
}
Web of Science
Figures / Tables:
Works referenced in this record:
MetaCluster 4.0: A Novel Binning Algorithm for NGS Reads and Huge Number of Species
journal, February 2012
- Wang, Yi; Leung, Henry C. M.; Yiu, S. M.
- Journal of Computational Biology, Vol. 19, Issue 2
Separating metagenomic short reads into genomes via clustering
journal, September 2012
- Tanaseichuk, Olga; Borneman, James; Jiang, Tao
- Algorithms for Molecular Biology, Vol. 7, Issue 1
MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample
journal, September 2012
- Wang, Y.; Leung, H. C. M.; Yiu, S. M.
- Bioinformatics, Vol. 28, Issue 18
Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw
journal, November 2011
- Mackelprang, Rachel; Waldrop, Mark P.; DeAngelis, Kristen M.
- Nature, Vol. 480, Issue 7377
Complete genome sequence of the myxobacterium Sorangium cellulosum
journal, October 2007
- Schneiker, Susanne; Perlova, Olena; Kaiser, Olaf
- Nature Biotechnology, Vol. 25, Issue 11
KAAS: an automatic genome annotation and pathway reconstruction server
journal, May 2007
- Moriya, Y.; Itoh, M.; Okuda, S.
- Nucleic Acids Research, Vol. 35, Issue S2, p. W182-W185
Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization
journal, August 2012
- Sharon, I.; Morowitz, M. J.; Thomas, B. C.
- Genome Research, Vol. 23, Issue 1
Fast and accurate short read alignment with Burrows-Wheeler transform
journal, May 2009
- Li, H.; Durbin, R.
- Bioinformatics, Vol. 25, Issue 14
Fermentation, Hydrogen, and Sulfur Metabolism in Multiple Uncultivated Bacterial Phyla
journal, September 2012
- Wrighton, K. C.; Thomas, B. C.; Sharon, I.
- Science, Vol. 337, Issue 6102
Data, information, knowledge and principle: back to metabolism in KEGG
journal, November 2013
- Kanehisa, Minoru; Goto, Susumu; Sato, Yoko
- Nucleic Acids Research, Vol. 42, Issue D1
MetaSim—A Sequencing Simulator for Genomics and Metagenomics
journal, October 2008
- Richter, Daniel C.; Ott, Felix; Auch, Alexander F.
- PLoS ONE, Vol. 3, Issue 10
Community-wide analysis of microbial genome sequence signatures
journal, January 2009
- Dick, Gregory J.; Andersson, Anders F.; Baker, Brett J.
- Genome Biology, Vol. 10, Issue 8
Community dynamics of cellulose-adapted thermophilic bacterial consortia: Cellulose-adapted consortia
journal, June 2013
- Eichorst, Stephanie A.; Varanasi, Patanjali; Stavila, Vatalie
- Environmental Microbiology, Vol. 15, Issue 9
Glycoside Hydrolase Activities of Thermophilic Bacterial Consortia Adapted to Switchgrass
journal, July 2011
- Gladden, John M.; Allgaier, Martin; Miller, Christopher S.
- Applied and Environmental Microbiology, Vol. 77, Issue 16, p. 5804-5812
Sequencing technologies — the next generation
journal, December 2009
- Metzker, Michael L.
- Nature Reviews Genetics, Vol. 11, Issue 1
Improvement of Phylogenies after Removing Divergent and Ambiguously Aligned Blocks from Protein Sequence Alignments
journal, August 2007
- Talavera, Gerard; Castresana, Jose
- Systematic Biology, Vol. 56, Issue 4
Discovery of Microorganisms and Enzymes Involved in High-Solids Decomposition of Rice Straw Using Metagenomic Analyses
journal, October 2013
- Reddy, Amitha P.; Simmons, Christopher W.; D’haeseleer, Patrik
- PLoS ONE, Vol. 8, Issue 10
MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads
journal, July 2012
- Namiki, Toshiaki; Hachiya, Tsuyoshi; Tanaka, Hideaki
- Nucleic Acids Research, Vol. 40, Issue 20
Genomic signature: characterization and classification of species assessed by chaos game representation of sequences
journal, October 1999
- Deschavanne, P. J.; Giron, A.; Vilain, J.
- Molecular Biology and Evolution, Vol. 16, Issue 10
Nucleotide, dinucleotide and trinucleotide frequencies explain patterns observed in chaos game representations of DNA sequences
journal, January 1993
- Goldman, Nick
- Nucleic Acids Research, Vol. 21, Issue 10
dbCAN: a web resource for automated carbohydrate-active enzyme annotation
journal, May 2012
- Yin, Yanbin; Mao, Xizeng; Yang, Jincai
- Nucleic Acids Research, Vol. 40, Issue W1
The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics
journal, January 2009
- Cantarel, B. L.; Coutinho, P. M.; Rancurel, C.
- Nucleic Acids Research, Vol. 37, Issue Database
Expanded phylogeny of myxobacteria and evidence for cultivation of the ‘unculturables’
journal, November 2010
- Garcia, Ronald; Gerth, Klaus; Stadler, Marc
- Molecular Phylogenetics and Evolution, Vol. 57, Issue 2
Genes from Nine Genomes Are Separated into Their Organisms in the Dinucleotide Composition Space
journal, January 1998
- Nakashima, H.
- DNA Research, Vol. 5, Issue 5
Velvet: Algorithms for de novo short read assembly using de Bruijn graphs
journal, February 2008
- Zerbino, D. R.; Birney, E.
- Genome Research, Vol. 18, Issue 5
Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
journal, April 2007
- Mavromatis, Konstantinos; Ivanova, Natalia; Barry, Kerrie
- Nature Methods, Vol. 4, Issue 6
CompostBin: A DNA Composition-Based Algorithm for Binning Environmental Shotgun Reads
book, January 2008
- Chatterji, Sourav; Yamazaki, Ichitaro; Bai, Zhaojun
- Lecture Notes in Computer Science
IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth
journal, April 2012
- Peng, Y.; Leung, H. C. M.; Yiu, S. M.
- Bioinformatics, Vol. 28, Issue 11
Tackling soil diversity with the assembly of large, complex metagenomes
journal, March 2014
- Howe, Adina Chuang; Jansson, Janet K.; Malfatti, Stephanie A.
- Proceedings of the National Academy of Sciences, Vol. 111, Issue 13
Systematic Identification of Gene Families for Use as “Markers” for Phylogenetic and Phylogeny-Driven Ecological Studies of Bacteria and Archaea and Their Major Subgroups
journal, October 2013
- Wu, Dongying; Jospin, Guillaume; Eisen, Jonathan A.
- PLoS ONE, Vol. 8, Issue 10
Genovo: De Novo Assembly for Metagenomes
journal, March 2011
- Laserson, Jonathan; Jojic, Vladimir; Koller, Daphne
- Journal of Computational Biology, Vol. 18, Issue 3
A Novel Abundance-Based Algorithm for Binning Metagenomic Sequences Using l -tuples
journal, March 2011
- Wu, Yu-Wei; Ye, Yuzhen
- Journal of Computational Biology, Vol. 18, Issue 3
FragGeneScan: predicting genes in short and error-prone reads
journal, August 2010
- Rho, Mina; Tang, Haixu; Ye, Yuzhen
- Nucleic Acids Research, Vol. 38, Issue 20
Alignment-free Visualization of Metagenomic Data by Nonlinear Dimension Reduction
journal, March 2014
- Laczny, Cedric C.; Pinel, Nicolás; Vlassis, Nikos
- Scientific Reports, Vol. 4, Issue 1
Accelerated Profile HMM Searches
journal, October 2011
- Eddy, Sean R.
- PLoS Computational Biology, Vol. 7, Issue 10
Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes
journal, May 2013
- Albertsen, Mads; Hugenholtz, Philip; Skarshewski, Adam
- Nature Biotechnology, Vol. 31, Issue 6
The Mosaic Genome of Anaeromyxobacter dehalogenans Strain 2CP-C Suggests an Aerobic Common Ancestor to the Delta-Proteobacteria
journal, May 2008
- Thomas, Sara H.; Wagner, Ryan D.; Arakaki, Adrian K.
- PLoS ONE, Vol. 3, Issue 5
eggNOG v4.0: nested orthology inference across 3686 organisms
journal, December 2013
- Powell, Sean; Forslund, Kristoffer; Szklarczyk, Damian
- Nucleic Acids Research, Vol. 42, Issue D1
Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota
journal, February 2012
- Iverson, V.; Morris, R. M.; Frazar, C. D.
- Science, Vol. 335, Issue 6068
Extraordinary expansion of a Sorangium cellulosum genome from an alkaline milieu
journal, July 2013
- Han, Kui; Li, Zhi-feng; Peng, Ran
- Scientific Reports, Vol. 3, Issue 1
MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods
journal, May 2011
- Tamura, K.; Peterson, D.; Peterson, N.
- Molecular Biology and Evolution, Vol. 28, Issue 10
Community structure and metabolism through reconstruction of microbial genomes from the environment
journal, February 2004
- Tyson, Gene W.; Chapman, Jarrod; Hugenholtz, Philip
- Nature, Vol. 428, Issue 6978
The Complete Genome of Teredinibacter turnerae T7901: An Intracellular Endosymbiont of Marine Wood-Boring Bivalves (Shipworms)
journal, July 2009
- Yang, Joyce C.; Madupu, Ramana; Durkin, A. Scott
- PLoS ONE, Vol. 4, Issue 7
Genomic mapping by fingerprinting random clones: A mathematical analysis
journal, April 1988
- Lander, Eric S.; Waterman, Michael S.
- Genomics, Vol. 2, Issue 3
Evolutionary Implications of Microbial Genome Tetranucleotide Frequency Biases
journal, January 2003
- Pride, D. T.
- Genome Research, Vol. 13, Issue 2
Integrative analysis of environmental sequences using MEGAN4
journal, June 2011
- Huson, D. H.; Mitra, S.; Ruscheweyh, H. -J.
- Genome Research, Vol. 21, Issue 9
Meta-IDBA: a de Novo assembler for metagenomic data
journal, June 2011
- Peng, Y.; Leung, H. C. M.; Yiu, S. M.
- Bioinformatics, Vol. 27, Issue 13
MUSCLE: multiple sequence alignment with high accuracy and high throughput
journal, March 2004
- Edgar, R. C.
- Nucleic Acids Research, Vol. 32, Issue 5, p. 1792-1797
Proteogenomic Analysis of a Thermophilic Bacterial Consortium Adapted to Deconstruct Switchgrass
journal, July 2013
- D'haeseleer, Patrik; Gladden, John M.; Allgaier, Martin
- PLoS ONE, Vol. 8, Issue 7, Article No. e68465
Metagenomic Discovery of Biomass-Degrading Genes and Genomes from Cow Rumen
journal, January 2011
- Hess, M.; Sczyrba, A.; Egan, R.
- Science, Vol. 331, Issue 6016
Targeted Discovery of Glycoside Hydrolases from a Switchgrass-Adapted Compost Community
journal, January 2010
- Allgaier, Martin; Reddy, Amitha; Park, Joshua I.
- PLoS ONE, Vol. 5, Issue 1
Myxobacteria, producers of novel bioactive substances
journal, September 2001
- Reichenbach, H.
- Journal of Industrial Microbiology and Biotechnology, Vol. 27, Issue 3
Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineage
journal, December 2011
- Dupont, Chris L.; Rusch, Douglas B.; Yooseph, Shibu
- The ISME Journal, Vol. 6, Issue 6
Expansion of the enzymatic repertoire of the CAZy database to integrate auxiliary redox enzymes
journal, January 2013
- Levasseur, Anthony; Drula, Elodie; Lombard, Vincent
- Biotechnology for Biofuels, Vol. 6, Issue 1, Article No. 41
Separating Metagenomic Short Reads into Genomes via Clustering
book, January 2011
- Tanaseichuk, Olga; Borneman, James; Jiang, Tao
- Lecture Notes in Computer Science
Works referencing / citing this record:
Refining the phylum Chlorobi by resolving the phylogeny and metabolic potential of the representative of a deeply branching, uncultivated lineage
journal, September 2015
- Hiras, Jennifer; Wu, Yu-Wei; Eichorst, Stephanie A.
- The ISME Journal, Vol. 10, Issue 4
Anaerobic degradation of hexadecane and phenanthrene coupled to sulfate reduction by enriched consortia from northern Gulf of Mexico seafloor sediment
journal, February 2019
- Shin, Boryoung; Kim, Minjae; Zengler, Karsten
- Scientific Reports, Vol. 9, Issue 1
Intensive allochthonous inputs along the Ganges River and their effect on microbial community composition and dynamics
journal, November 2018
- Zhang, Si‐Yu; Tsementzi, Despina; Hatt, Janet K.
- Environmental Microbiology, Vol. 21, Issue 1
IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses
journal, December 2016
- Narayanasamy, Shaman; Jarosz, Yohan; Muller, Emilie E. L.
- Genome Biology, Vol. 17, Issue 1
VizBin - an application for reference-independent visualization and human-augmented binning of metagenomic data
journal, January 2015
- Laczny, Cedric C.; Sternal, Tomasz; Plugaru, Valentin
- Microbiome, Vol. 3, Issue 1
Intermediate-Salinity Systems at High Altitudes in the Peruvian Andes Unveil a High Diversity and Abundance of Bacteria and Viruses
journal, November 2019
- Castelán-Sánchez, Hugo Gildardo; Elorrieta, Paola; Romoacca, Pedro
- Genes, Vol. 10, Issue 11
Metagenomic reconstructions of gut microbial metabolism in weanling pigs
journal, March 2019
- Wang, Weilan; Hu, Huifeng; Zijlstra, Ruurd T.
- Microbiome, Vol. 7, Issue 1
Phylogenetic, genomic, and biogeographic characterization of a novel and ubiquitous marine invertebrate-associated Rickettsiales parasite, Candidatus Aquarickettsia rohweri, gen. nov., sp. nov
journal, August 2019
- Klinges, J. Grace; Rosales, Stephanie M.; McMinds, Ryan
- The ISME Journal, Vol. 13, Issue 12
Accurate and complete genomes from metagenomes
journal, March 2020
- Chen, Lin-Xing; Anantharaman, Karthik; Shaiber, Alon
- Genome Research, Vol. 30, Issue 3
KBase Narrative - Metagenome-assembled genomes from Amazonian soil microbial consortia
dataset, January 2022
- Mandro, Jessica
- U.S. Department of Energy Systems Biology Knowledgebase
Machine learning for metagenomics: methods and tools
preprint, January 2015
- Soueidan, Hayssam; Nikolski, Macha
- arXiv