DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Next generation sequencing data of a defined microbial mock community

Abstract

Generating sequence data of a defined community composed of organisms with complete reference genomes is indispensable for the benchmarking of new genome sequence analysis methods, including assembly and binning tools. Moreover the validation of new sequencing library protocols and platforms to assess critical components such as sequencing errors and biases relies on such datasets. We here report the next generation metagenomic sequence data of a defined mock community (Mock Bacteria ARchaea Community; MBARC-26), composed of 23 bacterial and 3 archaeal strains with finished genomes. These strains span 10 phyla and 14 classes, a range of GC contents, genome sizes, repeat content and encompass a diverse abundance profile. Short read Illumina and long-read PacBio SMRT sequences of this mock community are described. These data represent a valuable resource for the scientific community, enabling extensive benchmarking and comparative evaluation of bioinformatics tools without the need to simulate data. As such, these data can aid in improving our current sequence data analysis toolkit and spur interest in the development of new tools.

Authors:
 [1];  [1];  [1];  [1];  [1];  [1];  [1];  [2];  [1];  [1];  [1];  [1];  [1];  [1]
  1. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
  2. Newcastle Univ. (United Kingdom)
Publication Date:
Research Org.:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States); USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
OSTI Identifier:
1624549
Alternate Identifier(s):
OSTI ID: 1897470
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
Scientific Data
Additional Journal Information:
Journal Volume: 3; Journal Issue: 1; Journal ID: ISSN 2052-4463
Publisher:
Nature Publishing Group
Country of Publication:
United States
Language:
English
Subject:
60 APPLIED LIFE SCIENCES; 59 BASIC BIOLOGICAL SCIENCES; Science & technology - other topics; Next-generation sequencing; Microbial communities; DNA sequencing; Metagenomics

Citation Formats

Singer, Esther, Andreopoulos, Bill, Bowers, Robert M., Lee, Janey, Deshpande, Shweta, Chiniquy, Jennifer, Ciobanu, Doina, Klenk, Hans-Peter, Zane, Matthew, Daum, Christopher, Clum, Alicia, Cheng, Jan-Fang, Copeland, Alex, and Woyke, Tanja. Next generation sequencing data of a defined microbial mock community. United States: N. p., 2016. Web. doi:10.1038/sdata.2016.81.
Singer, Esther, Andreopoulos, Bill, Bowers, Robert M., Lee, Janey, Deshpande, Shweta, Chiniquy, Jennifer, Ciobanu, Doina, Klenk, Hans-Peter, Zane, Matthew, Daum, Christopher, Clum, Alicia, Cheng, Jan-Fang, Copeland, Alex, & Woyke, Tanja. Next generation sequencing data of a defined microbial mock community. United States. https://doi.org/10.1038/sdata.2016.81
Singer, Esther, Andreopoulos, Bill, Bowers, Robert M., Lee, Janey, Deshpande, Shweta, Chiniquy, Jennifer, Ciobanu, Doina, Klenk, Hans-Peter, Zane, Matthew, Daum, Christopher, Clum, Alicia, Cheng, Jan-Fang, Copeland, Alex, and Woyke, Tanja. Tue . "Next generation sequencing data of a defined microbial mock community". United States. https://doi.org/10.1038/sdata.2016.81. https://www.osti.gov/servlets/purl/1624549.
@article{osti_1624549,
title = {Next generation sequencing data of a defined microbial mock community},
author = {Singer, Esther and Andreopoulos, Bill and Bowers, Robert M. and Lee, Janey and Deshpande, Shweta and Chiniquy, Jennifer and Ciobanu, Doina and Klenk, Hans-Peter and Zane, Matthew and Daum, Christopher and Clum, Alicia and Cheng, Jan-Fang and Copeland, Alex and Woyke, Tanja},
abstractNote = {Generating sequence data of a defined community composed of organisms with complete reference genomes is indispensable for the benchmarking of new genome sequence analysis methods, including assembly and binning tools. Moreover the validation of new sequencing library protocols and platforms to assess critical components such as sequencing errors and biases relies on such datasets. We here report the next generation metagenomic sequence data of a defined mock community (Mock Bacteria ARchaea Community; MBARC-26), composed of 23 bacterial and 3 archaeal strains with finished genomes. These strains span 10 phyla and 14 classes, a range of GC contents, genome sizes, repeat content and encompass a diverse abundance profile. Short read Illumina and long-read PacBio SMRT sequences of this mock community are described. These data represent a valuable resource for the scientific community, enabling extensive benchmarking and comparative evaluation of bioinformatics tools without the need to simulate data. As such, these data can aid in improving our current sequence data analysis toolkit and spur interest in the development of new tools.},
doi = {10.1038/sdata.2016.81},
journal = {Scientific Data},
number = 1,
volume = 3,
place = {United States},
year = {Tue Sep 27 00:00:00 EDT 2016},
month = {Tue Sep 27 00:00:00 EDT 2016}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 61 works
Citation information provided by
Web of Science

Figures / Tables:

Table 1 Table 1: Genome statistics of each mock community member. Genome size includes chromosomes and plasmids. All genomes are available as finished sequences. Phylum associations for each strain are abbreviated as follows: AD—Acidobacteria, AT—Actinobacteria, B—Bacteroidetes, D—Deinococcus-Thermus, E—Euryarchaeota, F—Firmicutes, P—Proteobacteria, S—Spirochaetes, T—Thermotogae, V—Verrucomicrobia. Isolation sources were obtained from literature on respective strains,more » where available. GC content is based on genome size. Genomes without NCBI repeat region annotation are denoted with an *.« less

Save / Share:

Works referenced in this record:

UCHIME improves sensitivity and speed of chimera detection
journal, June 2011


Impact of library preparation protocols and template quantity on the metagenomic reconstruction of a mock microbial community
journal, October 2015


Analysis of immune, microbiota and metabolome maturation in infants in a clinical trial of Lactobacillus paracasei CBA L74-fermented formula
journal, June 2020


Spatiotemporal persistence of multiple, diverse clades and toxins of Corynebacterium diphtheriae
journal, March 2021

  • Will, Robert C.; Ramamurthy, Thandavarayan; Sharma, Naresh Chand
  • Nature Communications, Vol. 12, Issue 1
  • DOI: 10.1038/s41467-021-21870-5

Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems
journal, January 2011


Development of a Dual-Index Sequencing Strategy and Curation Pipeline for Analyzing Amplicon Sequence Data on the MiSeq Illumina Sequencing Platform
journal, June 2013

  • Kozich, James J.; Westcott, Sarah L.; Baxter, Nielson T.
  • Applied and Environmental Microbiology, Vol. 79, Issue 17
  • DOI: 10.1128/AEM.01043-13

GenBank
journal, November 2012

  • Benson, Dennis A.; Cavanaugh, Mark; Clark, Karen
  • Nucleic Acids Research, Vol. 41, Issue D1
  • DOI: 10.1093/nar/gks1195

Evaluation of the Ion Torrent Personal Genome Machine for Gene-Targeted Studies Using Amplicons of the Nitrogenase Gene nifH
journal, April 2015

  • Zhang, Bangzhou; Penton, C. Ryan; Xue, Chao
  • Applied and Environmental Microbiology, Vol. 81, Issue 13
  • DOI: 10.1128/AEM.00111-15

EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data
journal, May 2011

  • Miller, Christopher S.; Baker, Brett J.; Thomas, Brian C.
  • Genome Biology, Vol. 12, Issue 5
  • DOI: 10.1186/gb-2011-12-5-r44

FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments
journal, March 2010


Microbial community structure and functional properties in permanently and seasonally flooded areas in Poyang Lake
journal, March 2020


MEMOSys: Platform for Genome-Scale Metabolic Models
book, January 2015


UPARSE: highly accurate OTU sequences from microbial amplicon reads
journal, August 2013


A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis
journal, April 2011

  • Sun, Y.; Cai, Y.; Huse, S. M.
  • Briefings in Bioinformatics, Vol. 13, Issue 1
  • DOI: 10.1093/bib/bbr009

High-resolution phylogenetic microbial community profiling
journal, February 2016

  • Singer, Esther; Bushnell, Brian; Coleman-Derr, Devin
  • The ISME Journal, Vol. 10, Issue 8
  • DOI: 10.1038/ismej.2015.249

SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes
journal, May 2012


The SILVA ribosomal RNA gene database project: improved data processing and web-based tools
journal, November 2012

  • Quast, Christian; Pruesse, Elmar; Yilmaz, Pelin
  • Nucleic Acids Research, Vol. 41, Issue D1
  • DOI: 10.1093/nar/gks1219

Analysis, Optimization and Verification of Illumina-Generated 16S rRNA Gene Amplicon Surveys
journal, April 2014


Organismal, genetic, and transcriptional variation in the deeply sequenced gut microbiomes of identical twins
journal, April 2010

  • Turnbaugh, P. J.; Quince, C.; Faith, J. J.
  • Proceedings of the National Academy of Sciences, Vol. 107, Issue 16
  • DOI: 10.1073/pnas.1002355107

The advantages of SMRT sequencing
journal, July 2013

  • Roberts, Richard J.; Carneiro, Mauricio O.; Schatz, Michael C.
  • Genome Biology, Vol. 14, Issue 7
  • DOI: 10.1186/gb-2013-14-7-405

The advantages of SMRT sequencing
journal, June 2013

  • Roberts, Richard J.; Carneiro, Mauricio O.; Schatz, Michael C.
  • Genome Biology, Vol. 14, Issue 6
  • DOI: 10.1186/gb-2013-14-6-405

MeCorS: Metagenome-enabled error correction of single cell sequencing reads
journal, March 2016


Library preparation methodology can influence genomic and functional predictions in human microbiome research
journal, October 2015

  • Jones, Marcus B.; Highlander, Sarah K.; Anderson, Ericka L.
  • Proceedings of the National Academy of Sciences, Vol. 112, Issue 45
  • DOI: 10.1073/pnas.1519288112

MEMOSys: Platform for Genome-Scale Metabolic Models
book, January 2013


Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons
journal, January 2011


MEMOSys: Platform for Genome-Scale Metabolic Models
book, January 2013


The human microbiome: there is much left to do
journal, June 2022


High-resolution phylogenetic microbial community profiling
journal, February 2016

  • Singer, Esther; Bushnell, Brian; Coleman-Derr, Devin
  • The ISME Journal, Vol. 10, Issue 8
  • DOI: 10.1038/ismej.2015.249

Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences
journal, August 2013

  • Langille, Morgan G. I.; Zaneveld, Jesse; Caporaso, J. Gregory
  • Nature Biotechnology, Vol. 31, Issue 9
  • DOI: 10.1038/nbt.2676

Library preparation methodology can influence genomic and functional predictions in human microbiome research
journal, October 2015

  • Jones, Marcus B.; Highlander, Sarah K.; Anderson, Ericka L.
  • Proceedings of the National Academy of Sciences, Vol. 112, Issue 45
  • DOI: 10.1073/pnas.1519288112

A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis
journal, April 2011

  • Sun, Y.; Cai, Y.; Huse, S. M.
  • Briefings in Bioinformatics, Vol. 13, Issue 1
  • DOI: 10.1093/bib/bbr009

UCHIME improves sensitivity and speed of chimera detection
journal, June 2011


SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes
journal, May 2012


MeCorS: Metagenome-enabled error correction of single cell sequencing reads
journal, March 2016


GenBank
journal, November 2012

  • Benson, Dennis A.; Cavanaugh, Mark; Clark, Karen
  • Nucleic Acids Research, Vol. 41, Issue D1
  • DOI: 10.1093/nar/gks1195

The SILVA ribosomal RNA gene database project: improved data processing and web-based tools
journal, November 2012

  • Quast, Christian; Pruesse, Elmar; Yilmaz, Pelin
  • Nucleic Acids Research, Vol. 41, Issue D1
  • DOI: 10.1093/nar/gks1219

Uncovering the novel characteristics of Asian honey bee, Apis cerana, by whole genome sequencing
journal, January 2015


Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems
journal, January 2011


EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data
journal, May 2011

  • Miller, Christopher S.; Baker, Brett J.; Thomas, Brian C.
  • Genome Biology, Vol. 12, Issue 5
  • DOI: 10.1186/gb-2011-12-5-r44

The advantages of SMRT sequencing
journal, June 2013

  • Roberts, Richard J.; Carneiro, Mauricio O.; Schatz, Michael C.
  • Genome Biology, Vol. 14, Issue 6
  • DOI: 10.1186/gb-2013-14-6-405

FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments
journal, March 2010


Analysis, Optimization and Verification of Illumina-Generated 16S rRNA Gene Amplicon Surveys
journal, April 2014


Works referencing / citing this record:

Key sub-community dynamics of medium-chain carboxylate production
journal, May 2019

  • Lambrecht, Johannes; Cichocki, Nicolas; Schattenberg, Florian
  • Microbial Cell Factories, Vol. 18, Issue 1
  • DOI: 10.1186/s12934-019-1143-8

Metaepigenomic analysis reveals the unexplored diversity of DNA methylation in an environmental prokaryotic community
journal, August 2018

  • Hiraoka, Satoshi; Okazaki, Yusuke; Anda, Mizue
  • Nature Communications
  • DOI: 10.1101/380360

SpaRC: scalable sequence clustering using Apache Spark
journal, August 2018


MetaCarvel: linking assembly graph motifs to biological variants
journal, August 2019


Nitrogen cycling in Sandusky Bay, Lake Erie: oscillations between strong and weak export and implications for harmful algal blooms
journal, January 2018

  • Salk, Kateri R.; Bullerjahn, George S.; McKay, Robert Michael L.
  • Biogeosciences, Vol. 15, Issue 9
  • DOI: 10.5194/bg-15-2891-2018

MEGAN-LR: new algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs
journal, January 2018


Synthetic microbe communities provide internal reference standards for metagenome sequencing and analysis
journal, August 2018


Quantification of variation and the impact of biomass in targeted 16S rRNA gene sequencing studies
journal, September 2018


MetaCarvel: linking assembly graph motifs to biological variants
text, January 2019


Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies
journal, November 2019


De novo Nanopore read quality improvement using deep learning
journal, November 2019


Evaluation of Primers Targeting the Diazotroph Functional Gene and Development of NifMAP – A Bioinformatics Pipeline for Analyzing nifH Amplicon Data
journal, April 2018

  • Angel, Roey; Nepel, Maximilian; Panhölzl, Christopher
  • Frontiers in Microbiology, Vol. 9
  • DOI: 10.3389/fmicb.2018.00703

Metaepigenomic analysis reveals the unexplored diversity of DNA methylation in an environmental prokaryotic community
journal, January 2019


CAMISIM: simulating metagenomes and microbial communities
journal, February 2019


100‐year‐old enigma solved: identification, genomic characterization and biogeography of the yet uncultured Planctomyces bekefii
journal, November 2019

  • Dedysh, Svetlana N.; Henke, Petra; Ivanova, Anastasia A.
  • Environmental Microbiology, Vol. 22, Issue 1
  • DOI: 10.1111/1462-2920.14838

Neutral mechanisms and niche differentiation in steady-state insular microbial communities revealed by single cell analysis: Non-equilibria systems
journal, November 2018

  • Liu, Zishu; Cichocki, Nicolas; Hübschmann, Thomas
  • Environmental Microbiology, Vol. 21, Issue 1
  • DOI: 10.1111/1462-2920.14437

BBMerge – Accurate paired shotgun read merging via overlap
journal, October 2017


PhyloMagnet: fast and accurate screening of short-read meta-omics data using gene-centric phylogenetics
journal, October 2019


A 16S rDNA PCR-based theoretical to actual delta approach on culturable mock communities revealed severe losses of diversity information
journal, April 2019

  • dos Santos, Hellen Ribeiro Martins; Argolo, Caio Suzart; Argôlo-Filho, Ronaldo Costa
  • BMC Microbiology, Vol. 19, Issue 1
  • DOI: 10.1186/s12866-019-1446-2

Plasmid detection and assembly in genomic and metagenomic data sets
journal, May 2019

  • Antipov, Dmitry; Raiko, Mikhail; Lapidus, Alla
  • Genome Research, Vol. 29, Issue 6
  • DOI: 10.1101/gr.241299.118

Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy
journal, May 2018

  • Sieber, Christian M. K.; Probst, Alexander J.; Sharrar, Allison
  • Nature Microbiology, Vol. 3, Issue 7
  • DOI: 10.1038/s41564-018-0171-1

The use of next generation sequencing for improving food safety: Translation into practice
journal, June 2019

  • Jagadeesan, Balamurugan; Gerner-Smidt, Peter; Allard, Marc W.
  • Food Microbiology, Vol. 79
  • DOI: 10.1016/j.fm.2018.11.005

STROBE-metagenomics: a STROBE extension statement to guide the reporting of metagenomics studies
journal, October 2020


Synthetic microbe communities provide internal reference standards for metagenome sequencing and analysis
journal, August 2018


Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies
journal, November 2019


Identifying accurate metagenome and amplicon software via a meta-analysis of sequence to taxonomy benchmarking studies
journal, October 2018

  • Gardner, Paul P.; Watson, Renee J.; Morgan, Xochitl C.
  • PeerJ
  • DOI: 10.1101/202077

De novo Nanopore read quality improvement using deep learning
journal, November 2019


Key sub-community dynamics of medium-chain carboxylate production
journal, May 2019

  • Lambrecht, Johannes; Cichocki, Nicolas; Schattenberg, Florian
  • Microbial Cell Factories, Vol. 18, Issue 1
  • DOI: 10.1186/s12934-019-1143-8

MetaCarvel: linking assembly graph motifs to biological variants
journal, August 2019


BBMerge – Accurate paired shotgun read merging via overlap
journal, October 2017


Figures/Tables have been extracted from DOE-funded journal article accepted manuscripts.