skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A method for achieving complete microbial genomes and improving bins from metagenomics data

Journal Article · · PLoS Computational Biology (Online)
ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [2]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States); Innovative Genomics Inst., Berkeley, CA (United States)

Metagenomics facilitates the study of the genetic information from uncultured microbes and complex microbial communities. Assembling complete genomes from metagenomics data is difficult because most samples have high organismal complexity and strain diversity. Some studies have attempted to extract complete bacterial, archaeal, and viral genomes and often focus on species with circular genomes so they can help confirm completeness with circularity. However, less than 100 circularized bacterial and archaeal genomes have been assembled and published from metagenomics data despite the thousands of datasets that are available. Circularized genomes are important for (1) building a reference collection as scaffolds for future assemblies, (2) providing complete gene content of a genome, (3) confirming little or no contamination of a genome, (4) studying the genomic context and synteny of genes, and (5) linking protein coding genes to ribosomal RNA genes to aid metabolic inference in 16S rRNA gene sequencing studies. We developed a semi-automated method called Jorg to help circularize small bacterial, archaeal, and viral genomes using iterative assembly, binning, and read mapping. In addition, this method exposes potential misassemblies from k-mer based assemblies. We chose species of the Candidate Phyla Radiation (CPR) to focus our initial efforts because they have small genomes and are only known to have one ribosomal RNA operon. In addition to 34 circular CPR genomes, we present one circular Margulisbacteria genome, one circular Chloroflexi genome, and two circular megaphage genomes from 19 public and published datasets. We demonstrate findings that would likely be difficult without circularizing genomes, including that ribosomal genes are likely not operonic in the majority of CPR, and that some CPR harbor diverged forms of RNase P RNA. Code and a tutorial for this method is available at https://github.com/lmlui/Jorg and is available on the DOE Systems Biology KnowledgeBase as a beta app.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1788019
Journal Information:
PLoS Computational Biology (Online), Vol. 17, Issue 5; ISSN 1553-7358
Publisher:
Public Library of ScienceCopyright Statement
Country of Publication:
United States
Language:
English

References (64)

Life without RNase P journal May 2008
QUAST: quality assessment tool for genome assemblies journal February 2013
Environmental Genomics Reveals a Single-Species Ecosystem Deep Within Earth journal October 2008
Accurate and complete genomes from metagenomes journal March 2020
New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0 journal March 2010
Characterization of the unlinked 16S rDNA and 23S-5S rRNA operon of Wolbachia pipientis, a prokaryotic parasite of insect gonads journal January 1995
Comparison of the Escherichia coli K-12 genome with sampled genomes of a Klebsiella pneumoniae and three Salmonella enterica serovars, Typhimurium, Typhi and Paratyphi journal December 2000
A mew computer method for the storage and manipulation of DNA gel reading data journal January 1980
Fast and accurate short read alignment with Burrows-Wheeler transform journal May 2009
Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software journal October 2017
Ultra-deep, long-read nanopore sequencing of mock microbial community standards journal May 2019
Multiplexed RNA structure characterization with selective 2'-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq) journal June 2011
New approaches for metagenome assembly with short reads journal February 2019
Metatranscriptomic evidence of pervasive and diverse chemolithoautotrophy relevant to C, S, N and Fe cycling in a shallow alluvial aquifer journal March 2016
Complete Genome of a Member of a New Bacterial Lineage in the Microgenomates Group Reveals an Unusual Nucleotide Composition Disparity Between Two Strands of DNA and Limited Metabolic Potential journal February 2020
Characterization of a putative 23S-5S rRNA operon of buchnera aphidicola (endosymbiont of aphids) unlinked to the 16S rRNA-encoding gene journal March 1995
A Whole-Genome Assembly of Drosophila journal March 2000
Using the miraEST Assembler for Reliable and Automated mRNA Transcript Assembly and SNP Detection in Sequenced ESTs journal May 2004
Potential for microbial H2 and metal transformations associated with novel bacteria and archaea in deep terrestrial subsurface sediments journal March 2017
Community dynamics and functional characteristics of naphthalene-degrading populations in contaminated surface sediments and hypoxic/anoxic groundwater: Dynamics of groundwater naphthalene biodegradation journal August 2018
Comparing and Evaluating Metagenome Assembly Tools from a Microbiologist’s Perspective - Not Only Size Matters! journal January 2017
Differential depth distribution of microbial function and putative symbionts through sediment-hosted aquifers in the deep terrestrial subsurface journal January 2018
Prodigal: prokaryotic gene recognition and translation initiation site identification journal March 2010
The ‘1% culturability paradigm’ needs to be carefully defined journal September 2019
Metagenomic binning and association of plasmids with bacterial host genomes using DNA methylation journal December 2017
CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes journal May 2015
Gap5—editing the billion fragment sequence assembly journal May 2010
Ribonuclease P: the diversity of a ubiquitous RNA processing enzyme journal June 1999
Plasmid detection and assembly in genomic and metagenomic data sets journal May 2019
Complete, closed bacterial genomes from microbiomes using nanopore sequencing journal February 2020
VARNA: Interactive drawing and editing of the RNA secondary structure journal April 2009
Application of tetranucleotide frequencies for the assignment of genomic fragments journal September 2004
Discovery of a minimal form of RNase P in Pyrobaculum journal December 2010
Growth dynamics of gut microbiota in health and disease inferred from single metagenomic samples journal July 2015
Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy journal May 2018
Rfam: Wikipedia, clans and the "decimal" release journal November 2010
KBase: The United States Department of Energy Systems Biology Knowledgebase journal July 2018
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing journal May 2012
W-IQ-TREE: a fast online phylogenetic tool for maximum likelihood analysis journal April 2016
Minimal and RNA-free RNase P in Aquifex aeolicus journal October 2017
Unusual biology across a group comprising more than 15% of domain Bacteria journal June 2015
Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes journal May 2013
Energy and carbon metabolisms in a deep terrestrial subsurface fluid microbial community journal June 2017
Efficient de novo assembly of large genomes using compressed data structures journal December 2011
Reconstructing mitochondrial genomes directly from genomic next-generation sequencing reads—a baiting and iterative mapping approach journal May 2013
Isolation of Borrelia burgdorferi from ticks in the Highlands of Scotland journal January 1999
Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly journal April 2013
Distinct temporal diversity profiles for nitrogen cycling genes in a hyporheic microbiome journal January 2020
Clades of huge phages from across Earth’s ecosystems journal February 2020
Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies journal November 2019
Unlinked rRNA genes are widespread among bacteria and archaea journal November 2019
Stable isotope informed genome-resolved metagenomics reveals that Saccharibacteria utilize microbially-processed plant-derived carbon journal July 2018
MetaQUAST: evaluation of metagenome assemblies journal November 2015
Background Adjusted Alignment-Free Dissimilarity Measures Improve the Detection of Horizontal Gene Transfer journal April 2018
Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life journal September 2017
Multi 'omics comparison reveals metabolome biochemistry, not microbiome composition or gene expression, corresponds to elevated biogeochemical function in the hyporheic zone journal November 2018
Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system journal October 2016
Genome dynamics in a natural archaeal population journal January 2007
MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies journal January 2019
Infernal 1.1: 100-fold faster RNA homology searches journal September 2013
Carbon Inputs From Riparian Vegetation Limit Oxidation of Physically Bound Organic Carbon Via Biochemical and Thermodynamic Processes: OC Oxidation Processes Across Vegetation journal December 2017
Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement journal November 2014
Genomic evolution drives the evolution of the translation system journal December 1995
Novel arrangement of rRNA genes in Mycoplasma gallisepticum: separation of the 16S gene of one set from the 23S and 5S genes. journal January 1989

Cited By (1)

Complete Genome of ORR Isolate Bacillus cereus CPT56D-587-MTF dataset January 2022