Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Biases in genome reconstruction from metagenomic data

Journal Article · · PeerJ
DOI:https://doi.org/10.7717/peerj.10119· OSTI ID:1693794
 [1];  [2];  [3]
  1. Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
  2. Department of Biological Sciences, Marine Environmental Biology Section, University of Southern California, Los Angeles, CA, USA, Center for Dark Energy Biosphere Investigations, University of Southern California, Los Angeles, CA, USA
  3. Chemical and Biological Signature Science Group, Pacific Northwest National Laboratory, Richland, WA, USA
Background

Advances in sequencing, assembly, and assortment of contigs into species-specific bins has enabled the reconstruction of genomes from metagenomic data (MAGs). Though a powerful technique, it is difficult to determine whether assembly and binning techniques are accurate when applied to environmental metagenomes due to a lack of complete reference genome sequences against which to check the resulting MAGs.

Methods

We compared MAGs derived from an enrichment culture containing ~20 organisms to complete genome sequences of 10 organisms isolated from the enrichment culture. Factors commonly considered in binning software—nucleotide composition and sequence repetitiveness—were calculated for both the correctly binned and not-binned regions. This direct comparison revealed biases in sequence characteristics and gene content in the not-binned regions. Additionally, the composition of three public data sets representing MAGs reconstructed from the Tara Oceans metagenomic data was compared to a set of representative genomes available through NCBI RefSeq to verify that the biases identified were observable in more complex data sets and using three contemporary binning software packages.

Results

Repeat sequences were frequently not binned in the genome reconstruction processes, as were sequence regions with variant nucleotide composition. Genes encoded on the not-binned regions were strongly biased towards ribosomal RNAs, transfer RNAs, mobile element functions and genes of unknown function. Our results support genome reconstruction as a robust process and suggest that reconstructions determined to be >90% complete are likely to effectively represent organismal function; however, population-level genotypic heterogeneity in natural populations, such as uneven distribution of plasmids, can lead to incorrect inferences.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
Center for Dark Energy Biosphere Investigations; USDOE
Grant/Contract Number:
AC02-05CH11231; AC05-76RL01830
OSTI ID:
1693794
Alternate ID(s):
OSTI ID: 1706676
Report Number(s):
PNNL-SA--151193; e10119
Journal Information:
PeerJ, Journal Name: PeerJ Vol. 8; ISSN 2167-8359
Publisher:
PeerJ Inc.Copyright Statement
Country of Publication:
United States
Language:
English

References (84)

Reductive Evolution in Bacteria: Buchnera sp., Rickettsia prowazekii and Mycobacterium leprae journal January 2001
Relationships Between Genomic G+C Content, RNA Secondary Structures, and Optimal Growth Temperature in Prokaryotes journal June 1997
Base composition bias might result from competition for metabolic resources journal June 2002
Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle journal January 2019
The metatranscriptome of a deep-sea hydrothermal plume is dominated by water column methanotrophs and lithotrophs journal June 2012
Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms journal March 2012
Microbial dark matter ecogenomics reveals complex synergistic networks in a methanogenic bioreactor journal January 2015
Metagenomic resolution of microbial functions in deep-sea hydrothermal plumes across the Eastern Lau Spreading Center journal June 2015
Underlying mechanisms for syntrophic metabolism of essential enzyme cofactors in microbial communities journal February 2017
Community structure and metabolism through reconstruction of microbial genomes from the environment journal February 2004
Insights into the phylogeny and coding potential of microbial dark matter journal July 2013
Unusual biology across a group comprising more than 15% of domain Bacteria journal June 2015
Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes journal May 2013
Genomic and transcriptomic evidence for scavenging of diverse organic compounds by widespread deep-sea archaea journal November 2015
Fast gapped-read alignment with Bowtie 2 journal March 2012
Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software journal October 2017
Microbial metabolisms in a 2.5-km-deep ecosystem created by hydraulic fracturing in shales journal September 2016
Mobile genetic elements: the agents of open source evolution journal September 2005
Disentangling the drivers of functional complexity at the metagenomic level in Shark Bay microbial mat microbiomes journal July 2018
Synthetic microbe communities provide internal reference standards for metagenome sequencing and analysis journal August 2018
Author Correction: Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life journal December 2017
Nitrogen-fixing populations of Planctomycetes and Proteobacteria are abundant in surface ocean metagenomes journal June 2018
A new genomic blueprint of the human gut microbiota journal February 2019
Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes journal July 2019
Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery journal August 2019
The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans journal January 2018
Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data journal May 2016
Genome assembly reborn: recent computational challenges journal May 2009
The Sequence Alignment/Map format and SAMtools journal June 2009
Infernal 1.1: 100-fold faster RNA homology searches journal September 2013
MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets journal October 2015
miComplete: weighted quality evaluation of assembled microbial genomes journal August 2019
Organismal and spatial partitioning of energy and macronutrient transformations within a hypersaline mat journal March 2017
BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics journal December 2017
Neutral Theory, Microbial Practice: Challenges in Bacterial Population Genetics journal April 2018
tRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic Sequence journal March 1997
Searching for RNA genes using base-composition statistics journal May 2002
Rfam 11.0: 10 years of RNA families journal November 2012
Grinder: a versatile amplicon and shotgun sequence simulator journal March 2012
IMG 4 version of the integrated microbial genomes comparative analysis system journal October 2013
Expanded microbial genome coverage and improved protein family annotation in the COG database journal November 2014
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation journal November 2015
High guanine–cytosine content is not an adaptation to high temperature: a comparative analysis amongst prokaryotes journal March 2001
Report of the Ad Hoc Committee on Reconciliation of Approaches to Bacterial Systematics journal October 1987
Circos: An information aesthetic for comparative genomics journal June 2009
Horizontal Gene Transfer in Bacterial and Archaeal Complete Genomes journal November 2000
Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization journal August 2012
CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes journal May 2015
Mauve: Multiple Alignment of Conserved Genomic Sequence With Rearrangements journal June 2004
Accurate and complete genomes from metagenomes journal March 2020
Application of tetranucleotide frequencies for the assignment of genomic fragments journal September 2004
Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing journal January 1995
Environmental Genome Shotgun Sequencing of the Sargasso Sea journal April 2004
Community Proteomics of a Natural Microbial Biofilm journal June 2005
Community Genomics Among Stratified Microbial Assemblages in the Ocean's Interior journal January 2006
Bacterial Community Variation in Human Body Habitats Across Space and Time journal November 2009
Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota journal February 2012
Identification and Resolution of Microdiversity through Metagenomic Sequencing of Parallel Consortia journal October 2015
High-Throughput Metagenomic Technologies for Complex Microbial Community Analysis: Open and Closed Formats journal January 2015
Moleculo Long-Read Sequencing Facilitates Assembly and Genomic Binning from Complex Soil Metagenomes journal June 2016
Comparative dna Analysis Across Diverse Genomes journal December 1998
Measurement of in Situ Activities of Nonphotosynthetic Microorganisms in Aquatic and Terrestrial Habitats journal October 1985
Pathogenicity Islands and the Evolution of Microbes journal October 2000
A better sequence-read simulator program for metagenomics journal September 2014
Analysis of intra-genomic GC content homogeneity within prokaryotes journal January 2010
GemSIM: general, error-model based simulator of next-generation sequencing data journal January 2012
Versatile and open software for comparing large genomes journal January 2004
Community-wide analysis of microbial genome sequence signatures journal January 2009
Genomic resolution of linkages in carbon, nitrogen, and sulfur cycling among widespread estuary sediment bacteria journal April 2015
Recovering complete and draft population genomes from metagenome datasets journal March 2016
Members of the Candidate Phyla Radiation are functionally differentiated by carbon- and nitrogen-cycling capabilities journal September 2017
Genomic and metagenomic insights into the microbial community of a thermal spring journal January 2019
CAMISIM: simulating metagenomes and microbial communities journal February 2019
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4) journal October 2015
On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other journal March 1947
MetaSim—A Sequencing Simulator for Genomics and Metagenomics journal October 2008
NeSSM: A Next-Generation Sequencing Simulator for Metagenomics journal October 2013
An Analysis of Variance Test for Normality (Complete Samples) journal December 1965
Phototrophic biofilm assembly in microbial-mat-derived unicyanobacterial consortia: model systems for the study of autotroph-heterotroph interactions journal April 2014
Genome reconstructions indicate the partitioning of ecological functions inside a phytoplankton bloom in the Amundsen Sea, Antarctica journal October 2015
MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities journal January 2015
Anvi’o: an advanced analysis and visualization platform for ‘omics data journal January 2015
BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation journal January 2017
GroopM: an automated tool for the recovery of population genomes from related metagenomes journal January 2014