Next generation sequencing data of a defined microbial mock community
Abstract
Generating sequence data of a defined community composed of organisms with complete reference genomes is indispensable for the benchmarking of new genome sequence analysis methods, including assembly and binning tools. Moreover the validation of new sequencing library protocols and platforms to assess critical components such as sequencing errors and biases relies on such datasets. We here report the next generation metagenomic sequence data of a defined mock community (Mock Bacteria ARchaea Community; MBARC-26), composed of 23 bacterial and 3 archaeal strains with finished genomes. These strains span 10 phyla and 14 classes, a range of GC contents, genome sizes, repeat content and encompass a diverse abundance profile. Short read Illumina and long-read PacBio SMRT sequences of this mock community are described. These data represent a valuable resource for the scientific community, enabling extensive benchmarking and comparative evaluation of bioinformatics tools without the need to simulate data. As such, these data can aid in improving our current sequence data analysis toolkit and spur interest in the development of new tools.
- Authors:
-
- USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
- Newcastle Univ. (United Kingdom)
- Publication Date:
- Research Org.:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States); USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Biological and Environmental Research (BER)
- OSTI Identifier:
- 1624549
- Alternate Identifier(s):
- OSTI ID: 1897470
- Grant/Contract Number:
- AC02-05CH11231
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Scientific Data
- Additional Journal Information:
- Journal Volume: 3; Journal Issue: 1; Journal ID: ISSN 2052-4463
- Publisher:
- Nature Publishing Group
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 60 APPLIED LIFE SCIENCES; 59 BASIC BIOLOGICAL SCIENCES; Science & technology - other topics; Next-generation sequencing; Microbial communities; DNA sequencing; Metagenomics
Citation Formats
Singer, Esther, Andreopoulos, Bill, Bowers, Robert M., Lee, Janey, Deshpande, Shweta, Chiniquy, Jennifer, Ciobanu, Doina, Klenk, Hans-Peter, Zane, Matthew, Daum, Christopher, Clum, Alicia, Cheng, Jan-Fang, Copeland, Alex, and Woyke, Tanja. Next generation sequencing data of a defined microbial mock community. United States: N. p., 2016.
Web. doi:10.1038/sdata.2016.81.
Singer, Esther, Andreopoulos, Bill, Bowers, Robert M., Lee, Janey, Deshpande, Shweta, Chiniquy, Jennifer, Ciobanu, Doina, Klenk, Hans-Peter, Zane, Matthew, Daum, Christopher, Clum, Alicia, Cheng, Jan-Fang, Copeland, Alex, & Woyke, Tanja. Next generation sequencing data of a defined microbial mock community. United States. https://doi.org/10.1038/sdata.2016.81
Singer, Esther, Andreopoulos, Bill, Bowers, Robert M., Lee, Janey, Deshpande, Shweta, Chiniquy, Jennifer, Ciobanu, Doina, Klenk, Hans-Peter, Zane, Matthew, Daum, Christopher, Clum, Alicia, Cheng, Jan-Fang, Copeland, Alex, and Woyke, Tanja. Tue .
"Next generation sequencing data of a defined microbial mock community". United States. https://doi.org/10.1038/sdata.2016.81. https://www.osti.gov/servlets/purl/1624549.
@article{osti_1624549,
title = {Next generation sequencing data of a defined microbial mock community},
author = {Singer, Esther and Andreopoulos, Bill and Bowers, Robert M. and Lee, Janey and Deshpande, Shweta and Chiniquy, Jennifer and Ciobanu, Doina and Klenk, Hans-Peter and Zane, Matthew and Daum, Christopher and Clum, Alicia and Cheng, Jan-Fang and Copeland, Alex and Woyke, Tanja},
abstractNote = {Generating sequence data of a defined community composed of organisms with complete reference genomes is indispensable for the benchmarking of new genome sequence analysis methods, including assembly and binning tools. Moreover the validation of new sequencing library protocols and platforms to assess critical components such as sequencing errors and biases relies on such datasets. We here report the next generation metagenomic sequence data of a defined mock community (Mock Bacteria ARchaea Community; MBARC-26), composed of 23 bacterial and 3 archaeal strains with finished genomes. These strains span 10 phyla and 14 classes, a range of GC contents, genome sizes, repeat content and encompass a diverse abundance profile. Short read Illumina and long-read PacBio SMRT sequences of this mock community are described. These data represent a valuable resource for the scientific community, enabling extensive benchmarking and comparative evaluation of bioinformatics tools without the need to simulate data. As such, these data can aid in improving our current sequence data analysis toolkit and spur interest in the development of new tools.},
doi = {10.1038/sdata.2016.81},
journal = {Scientific Data},
number = 1,
volume = 3,
place = {United States},
year = {Tue Sep 27 00:00:00 EDT 2016},
month = {Tue Sep 27 00:00:00 EDT 2016}
}
Web of Science
Figures / Tables:
Works referenced in this record:
UCHIME improves sensitivity and speed of chimera detection
journal, June 2011
- Edgar, Robert C.; Haas, Brian J.; Clemente, Jose C.
- Bioinformatics, Vol. 27, Issue 16
Impact of library preparation protocols and template quantity on the metagenomic reconstruction of a mock microbial community
journal, October 2015
- Bowers, Robert M.; Clum, Alicia; Tice, Hope
- BMC Genomics, Vol. 16, Issue 1
Analysis of immune, microbiota and metabolome maturation in infants in a clinical trial of Lactobacillus paracasei CBA L74-fermented formula
journal, June 2020
- Roggero, Paola; Liotto, Nadia; Pozzi, Chiara
- Nature Communications, Vol. 11, Issue 1
Spatiotemporal persistence of multiple, diverse clades and toxins of Corynebacterium diphtheriae
journal, March 2021
- Will, Robert C.; Ramamurthy, Thandavarayan; Sharma, Naresh Chand
- Nature Communications, Vol. 12, Issue 1
Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems
journal, January 2011
- Minoche, André E.; Dohm, Juliane C.; Himmelbauer, Heinz
- Genome Biology, Vol. 12, Issue 11
Comparison of DNA Extraction Methods for Microbial Community Profiling with an Application to Pediatric Bronchoalveolar Lavage Samples
journal, April 2012
- Willner, Dana; Daly, Joshua; Whiley, David
- PLoS ONE, Vol. 7, Issue 4
Development of a Dual-Index Sequencing Strategy and Curation Pipeline for Analyzing Amplicon Sequence Data on the MiSeq Illumina Sequencing Platform
journal, June 2013
- Kozich, James J.; Westcott, Sarah L.; Baxter, Nielson T.
- Applied and Environmental Microbiology, Vol. 79, Issue 17
GenBank
journal, November 2012
- Benson, Dennis A.; Cavanaugh, Mark; Clark, Karen
- Nucleic Acids Research, Vol. 41, Issue D1
Succession of endophytic fungi and arbuscular mycorrhizal fungi associated with the growth of plant and their correlation with secondary metabolites in the roots of plants
journal, April 2021
- Dang, Hanli; Zhang, Tao; Wang, Zhongke
- BMC Plant Biology, Vol. 21, Issue 1
Evaluation of the Ion Torrent Personal Genome Machine for Gene-Targeted Studies Using Amplicons of the Nitrogenase Gene nifH
journal, April 2015
- Zhang, Bangzhou; Penton, C. Ryan; Xue, Chao
- Applied and Environmental Microbiology, Vol. 81, Issue 13
EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data
journal, May 2011
- Miller, Christopher S.; Baker, Brett J.; Thomas, Brian C.
- Genome Biology, Vol. 12, Issue 5
FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments
journal, March 2010
- Price, Morgan N.; Dehal, Paramvir S.; Arkin, Adam P.
- PLoS ONE, Vol. 5, Issue 3
Microbial community structure and functional properties in permanently and seasonally flooded areas in Poyang Lake
journal, March 2020
- Liu, Yang; Ren, Ze; Qu, Xiaodong
- Scientific Reports, Vol. 10, Issue 1
MEMOSys: Platform for Genome-Scale Metabolic Models
book, January 2015
- Pabinger, Stephan; Trajanoski, Zlatko
- Encyclopedia of Metagenomics
UPARSE: highly accurate OTU sequences from microbial amplicon reads
journal, August 2013
- Edgar, Robert C.
- Nature Methods, Vol. 10, Issue 10
A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis
journal, April 2011
- Sun, Y.; Cai, Y.; Huse, S. M.
- Briefings in Bioinformatics, Vol. 13, Issue 1
High-resolution phylogenetic microbial community profiling
journal, February 2016
- Singer, Esther; Bushnell, Brian; Coleman-Derr, Devin
- The ISME Journal, Vol. 10, Issue 8
SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes
journal, May 2012
- Pruesse, Elmar; Peplies, Jörg; Glöckner, Frank Oliver
- Bioinformatics, Vol. 28, Issue 14
The SILVA ribosomal RNA gene database project: improved data processing and web-based tools
journal, November 2012
- Quast, Christian; Pruesse, Elmar; Yilmaz, Pelin
- Nucleic Acids Research, Vol. 41, Issue D1
Analysis, Optimization and Verification of Illumina-Generated 16S rRNA Gene Amplicon Surveys
journal, April 2014
- Nelson, Michael C.; Morrison, Hilary G.; Benjamino, Jacquelynn
- PLoS ONE, Vol. 9, Issue 4
Evaluation of 16S rDNA-Based Community Profiling for Human Microbiome Research
journal, June 2012
- ,
- PLoS ONE, Vol. 7, Issue 6
Organismal, genetic, and transcriptional variation in the deeply sequenced gut microbiomes of identical twins
journal, April 2010
- Turnbaugh, P. J.; Quince, C.; Faith, J. J.
- Proceedings of the National Academy of Sciences, Vol. 107, Issue 16
The advantages of SMRT sequencing
journal, July 2013
- Roberts, Richard J.; Carneiro, Mauricio O.; Schatz, Michael C.
- Genome Biology, Vol. 14, Issue 7
The advantages of SMRT sequencing
journal, June 2013
- Roberts, Richard J.; Carneiro, Mauricio O.; Schatz, Michael C.
- Genome Biology, Vol. 14, Issue 6
MeCorS: Metagenome-enabled error correction of single cell sequencing reads
journal, March 2016
- Bremges, Andreas; Singer, Esther; Woyke, Tanja
- Bioinformatics, Vol. 32, Issue 14
Library preparation methodology can influence genomic and functional predictions in human microbiome research
journal, October 2015
- Jones, Marcus B.; Highlander, Sarah K.; Anderson, Ericka L.
- Proceedings of the National Academy of Sciences, Vol. 112, Issue 45
MEMOSys: Platform for Genome-Scale Metabolic Models
book, January 2013
- Pabinger, Stephan; Trajanoski, Zlatko
- Encyclopedia of Metagenomics
Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons
journal, January 2011
- Haas, B. J.; Gevers, D.; Earl, A. M.
- Genome Research, Vol. 21, Issue 3
MEMOSys: Platform for Genome-Scale Metabolic Models
book, January 2013
- Pabinger, Stephan; Trajanoski, Zlatko
- Encyclopedia of Metagenomics
The human microbiome: there is much left to do
journal, June 2022
- Ley, Ruth
- Nature, Vol. 606, Issue 7914
High-resolution phylogenetic microbial community profiling
journal, February 2016
- Singer, Esther; Bushnell, Brian; Coleman-Derr, Devin
- The ISME Journal, Vol. 10, Issue 8
Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences
journal, August 2013
- Langille, Morgan G. I.; Zaneveld, Jesse; Caporaso, J. Gregory
- Nature Biotechnology, Vol. 31, Issue 9
Library preparation methodology can influence genomic and functional predictions in human microbiome research
journal, October 2015
- Jones, Marcus B.; Highlander, Sarah K.; Anderson, Ericka L.
- Proceedings of the National Academy of Sciences, Vol. 112, Issue 45
A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis
journal, April 2011
- Sun, Y.; Cai, Y.; Huse, S. M.
- Briefings in Bioinformatics, Vol. 13, Issue 1
UCHIME improves sensitivity and speed of chimera detection
journal, June 2011
- Edgar, Robert C.; Haas, Brian J.; Clemente, Jose C.
- Bioinformatics, Vol. 27, Issue 16
SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes
journal, May 2012
- Pruesse, Elmar; Peplies, Jörg; Glöckner, Frank Oliver
- Bioinformatics, Vol. 28, Issue 14
MeCorS: Metagenome-enabled error correction of single cell sequencing reads
journal, March 2016
- Bremges, Andreas; Singer, Esther; Woyke, Tanja
- Bioinformatics, Vol. 32, Issue 14
GenBank
journal, November 2012
- Benson, Dennis A.; Cavanaugh, Mark; Clark, Karen
- Nucleic Acids Research, Vol. 41, Issue D1
The SILVA ribosomal RNA gene database project: improved data processing and web-based tools
journal, November 2012
- Quast, Christian; Pruesse, Elmar; Yilmaz, Pelin
- Nucleic Acids Research, Vol. 41, Issue D1
Uncovering the novel characteristics of Asian honey bee, Apis cerana, by whole genome sequencing
journal, January 2015
- Park, Doori; Jung, Je Won; Choi, Beom-Soon
- BMC Genomics, Vol. 16, Issue 1
Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems
journal, January 2011
- Minoche, André E.; Dohm, Juliane C.; Himmelbauer, Heinz
- Genome Biology, Vol. 12, Issue 11
EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data
journal, May 2011
- Miller, Christopher S.; Baker, Brett J.; Thomas, Brian C.
- Genome Biology, Vol. 12, Issue 5
The advantages of SMRT sequencing
journal, June 2013
- Roberts, Richard J.; Carneiro, Mauricio O.; Schatz, Michael C.
- Genome Biology, Vol. 14, Issue 6
FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments
journal, March 2010
- Price, Morgan N.; Dehal, Paramvir S.; Arkin, Adam P.
- PLoS ONE, Vol. 5, Issue 3
Comparison of DNA Extraction Methods for Microbial Community Profiling with an Application to Pediatric Bronchoalveolar Lavage Samples
journal, April 2012
- Willner, Dana; Daly, Joshua; Whiley, David
- PLoS ONE, Vol. 7, Issue 4
Analysis, Optimization and Verification of Illumina-Generated 16S rRNA Gene Amplicon Surveys
journal, April 2014
- Nelson, Michael C.; Morrison, Hilary G.; Benjamino, Jacquelynn
- PLoS ONE, Vol. 9, Issue 4
Works referencing / citing this record:
Key sub-community dynamics of medium-chain carboxylate production
journal, May 2019
- Lambrecht, Johannes; Cichocki, Nicolas; Schattenberg, Florian
- Microbial Cell Factories, Vol. 18, Issue 1
Metaepigenomic analysis reveals the unexplored diversity of DNA methylation in an environmental prokaryotic community
journal, August 2018
- Hiraoka, Satoshi; Okazaki, Yusuke; Anda, Mizue
- Nature Communications
SpaRC: scalable sequence clustering using Apache Spark
journal, August 2018
- Shi, Lizhen; Meng, Xiandong; Tseng, Elizabeth
- Bioinformatics, Vol. 35, Issue 5
MetaCarvel: linking assembly graph motifs to biological variants
journal, August 2019
- Ghurye, Jay; Treangen, Todd; Fedarko, Marcus
- Genome Biology, Vol. 20, Issue 1
Nitrogen cycling in Sandusky Bay, Lake Erie: oscillations between strong and weak export and implications for harmful algal blooms
journal, January 2018
- Salk, Kateri R.; Bullerjahn, George S.; McKay, Robert Michael L.
- Biogeosciences, Vol. 15, Issue 9
Assessing soil bacterial community and dynamics by integrated high-throughput absolute abundance quantification
journal, January 2018
- Lou, Jun; Yang, Li; Wang, Haizhen
- PeerJ, Vol. 6
Human Virome and Disease: High-Throughput Sequencing for Virus Discovery, Identification of Phage-Bacteria Dysbiosis and Development of Therapeutic Approaches with Emphasis on the Human Gut
journal, July 2019
- Santiago-Rodriguez, Tasha M.; Hollister, Emily B.
- Viruses, Vol. 11, Issue 7
MEGAN-LR: new algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs
journal, January 2018
- Huson, Daniel H.; Albrecht, Benjamin; Bağcı, Caner
- Biology Direct, Vol. 13, Issue 1
Synthetic microbe communities provide internal reference standards for metagenome sequencing and analysis
journal, August 2018
- Hardwick, Simon A.; Chen, Wendy Y.; Wong, Ted
- Nature Communications, Vol. 9, Issue 1
Quantification of variation and the impact of biomass in targeted 16S rRNA gene sequencing studies
journal, September 2018
- Bender, Jeffrey M.; Li, Fan; Adisetiyo, Helty
- Microbiome, Vol. 6, Issue 1
VITCOMIC2: visualization tool for the phylogenetic composition of microbial communities based on 16S rRNA gene amplicons and metagenomic shotgun sequencing
journal, March 2018
- Mori, Hiroshi; Maruyama, Takayuki; Yano, Masahiro
- BMC Systems Biology, Vol. 12, Issue S2
MetaCarvel: linking assembly graph motifs to biological variants
text, January 2019
- Ghurye, Jay; Treangen, Todd; Fedarko, Marcus
- Springer Nature
Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies
journal, November 2019
- Sevim, Volkan; Lee, Juna; Egan, Robert
- Scientific Data, Vol. 6, Issue 1
De novo Nanopore read quality improvement using deep learning
journal, November 2019
- LaPierre, Nathan; Egan, Rob; Wang, Wei
- BMC Bioinformatics, Vol. 20, Issue 1
Evaluation of Primers Targeting the Diazotroph Functional Gene and Development of NifMAP – A Bioinformatics Pipeline for Analyzing nifH Amplicon Data
journal, April 2018
- Angel, Roey; Nepel, Maximilian; Panhölzl, Christopher
- Frontiers in Microbiology, Vol. 9
Metaepigenomic analysis reveals the unexplored diversity of DNA methylation in an environmental prokaryotic community
journal, January 2019
- Hiraoka, Satoshi; Okazaki, Yusuke; Anda, Mizue
- Nature Communications, Vol. 10, Issue 1
CAMISIM: simulating metagenomes and microbial communities
journal, February 2019
- Fritz, Adrian; Hofmann, Peter; Majda, Stephan
- Microbiome, Vol. 7, Issue 1
100‐year‐old enigma solved: identification, genomic characterization and biogeography of the yet uncultured Planctomyces bekefii
journal, November 2019
- Dedysh, Svetlana N.; Henke, Petra; Ivanova, Anastasia A.
- Environmental Microbiology, Vol. 22, Issue 1
Neutral mechanisms and niche differentiation in steady-state insular microbial communities revealed by single cell analysis: Non-equilibria systems
journal, November 2018
- Liu, Zishu; Cichocki, Nicolas; Hübschmann, Thomas
- Environmental Microbiology, Vol. 21, Issue 1
BBMerge – Accurate paired shotgun read merging via overlap
journal, October 2017
- Bushnell, Brian; Rood, Jonathan; Singer, Esther
- PLOS ONE, Vol. 12, Issue 10
PhyloMagnet: fast and accurate screening of short-read meta-omics data using gene-centric phylogenetics
journal, October 2019
- Schön, Max E.; Eme, Laura; Ettema, Thijs J. G.
- Bioinformatics
A 16S rDNA PCR-based theoretical to actual delta approach on culturable mock communities revealed severe losses of diversity information
journal, April 2019
- dos Santos, Hellen Ribeiro Martins; Argolo, Caio Suzart; Argôlo-Filho, Ronaldo Costa
- BMC Microbiology, Vol. 19, Issue 1
Plasmid detection and assembly in genomic and metagenomic data sets
journal, May 2019
- Antipov, Dmitry; Raiko, Mikhail; Lapidus, Alla
- Genome Research, Vol. 29, Issue 6
Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy
journal, May 2018
- Sieber, Christian M. K.; Probst, Alexander J.; Sharrar, Allison
- Nature Microbiology, Vol. 3, Issue 7
The use of next generation sequencing for improving food safety: Translation into practice
journal, June 2019
- Jagadeesan, Balamurugan; Gerner-Smidt, Peter; Allard, Marc W.
- Food Microbiology, Vol. 79
STROBE-metagenomics: a STROBE extension statement to guide the reporting of metagenomics studies
journal, October 2020
- Bharucha, Tehmina; Oeser, Clarissa; Balloux, Francois
- The Lancet Infectious Diseases, Vol. 20, Issue 10
Synthetic microbe communities provide internal reference standards for metagenome sequencing and analysis
journal, August 2018
- Hardwick, Simon A.; Chen, Wendy Y.; Wong, Ted
- Nature Communications, Vol. 9, Issue 1
Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies
journal, November 2019
- Sevim, Volkan; Lee, Juna; Egan, Robert
- Scientific Data, Vol. 6, Issue 1
Identifying accurate metagenome and amplicon software via a meta-analysis of sequence to taxonomy benchmarking studies
journal, October 2018
- Gardner, Paul P.; Watson, Renee J.; Morgan, Xochitl C.
- PeerJ
De novo Nanopore read quality improvement using deep learning
journal, November 2019
- LaPierre, Nathan; Egan, Rob; Wang, Wei
- BMC Bioinformatics, Vol. 20, Issue 1
VITCOMIC2: visualization tool for the phylogenetic composition of microbial communities based on 16S rRNA gene amplicons and metagenomic shotgun sequencing
journal, March 2018
- Mori, Hiroshi; Maruyama, Takayuki; Yano, Masahiro
- BMC Systems Biology, Vol. 12, Issue S2
Key sub-community dynamics of medium-chain carboxylate production
journal, May 2019
- Lambrecht, Johannes; Cichocki, Nicolas; Schattenberg, Florian
- Microbial Cell Factories, Vol. 18, Issue 1
MetaCarvel: linking assembly graph motifs to biological variants
journal, August 2019
- Ghurye, Jay; Treangen, Todd; Fedarko, Marcus
- Genome Biology, Vol. 20, Issue 1
BBMerge – Accurate paired shotgun read merging via overlap
journal, October 2017
- Bushnell, Brian; Rood, Jonathan; Singer, Esther
- PLOS ONE, Vol. 12, Issue 10
Assessing soil bacterial community and dynamics by integrated high-throughput absolute abundance quantification
journal, January 2018
- Lou, Jun; Yang, Li; Wang, Haizhen
- PeerJ, Vol. 6
Figures / Tables found in this record: