skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: ProDeGe: A computational protocol for fully automated decontamination of genomes

Journal Article · · The ISME Journal

Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequences from them are propagating into public databases to drive novel scientific discoveries, rigorous quality controls and decontamination protocols are urgently needed. Here, we present ProDeGe (Protocol for fully automated Decontamination of Genomes), the first computational protocol for fully automated decontamination of draft genomes. ProDeGe classifies sequences into two classes—clean and contaminant—using a combination of homology and feature-based methodologies. On average, 84% of sequence from the non-target organism is removed from the data set (specificity) and 84% of the sequence from the target organism is retained (sensitivity). Lastly, the procedure operates successfully at a rate of ~0.30 CPU core hours per megabase of sequence and can be applied to any type of genome sequence.

Research Organization:
USDOE Joint Genome Institute (JGI), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1346926
Journal Information:
The ISME Journal, Vol. 10, Issue 1; ISSN 1751-7362
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 40 works
Citation information provided by
Web of Science

References (16)

SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing journal May 2012
BLAST+: architecture and applications journal January 2009
Targeted metagenomics and ecology of globally important uncultured eukaryotic phytoplankton journal July 2010
Hidden Diversity in Honey Bee Gut Symbionts Detected by Single-Cell Genomics journal September 2014
Genomic insights into the uncultivated marine Zetaproteobacteria at Loihi Seamount journal October 2014
SmashCell: a software framework for the analysis of single-cell amplified genome sequences: Fig. 1. journal October 2010
Prodigal: prokaryotic gene recognition and translation initiation site identification journal March 2010
Single-cell genomics journal March 2011
IMG 4 version of the integrated microbial genomes comparative analysis system journal October 2013
Large-scale contamination of microbial isolate genomes by Illumina PhiX control journal March 2015
Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes journal July 2014
Insights into the phylogeny and coding potential of microbial dark matter journal July 2013
Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets journal March 2011
Genomes from Metagenomics journal November 2013
Prevalent genome streamlining and latitudinal divergence of planktonic bacteria in the surface ocean journal June 2013
Decontamination of MDA Reagents for Single Cell Whole Genome Amplification journal October 2011

Cited By (24)

Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea journal August 2017
Single-cell genome sequencing: current state of the science journal January 2016
Niche differentiation is spatially and temporally regulated in the rhizosphere journal January 2020
Massively parallel whole genome amplification for single-cell sequencing using droplet microfluidics journal July 2017
Obtaining high-quality draft genomes from uncultured microbes by cleaning and co-assembly of single-cell amplified genomes journal February 2018
Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements journal October 2016
ContEst16S: an algorithm that identifies contaminated prokaryotic genomes using 16S RNA gene sequences journal June 2017
Genomic evidence for distinct carbon substrate preferences and ecological niches of Bathyarchaeota in estuarine sediments: Genomic content of uncultured benthic Bathyarchaeota journal January 2016
Enrichment of Root Endophytic Bacteria from Populus deltoides and Single-Cell-Genomics Analysis journal July 2016
Draft Genome Sequence of Aeribacillus pallidus Strain 8m3, a Thermophilic Hydrocarbon-Oxidizing Bacterium Isolated from the Dagang Oil Field (China) journal June 2016
acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data journal December 2016
Within-species contamination of bacterial whole-genome sequence data has a greater influence on clustering analyses than between-species contamination journal December 2019
BlobTools: Interrogation of genome assemblies journal January 2017
Consensus assessment of the contamination level of publicly available cyanobacterial genomes journal July 2018
Prevalence and Implications of Contamination in Public Genomic Resources: A Case Study of 43 Reference Arthropod Assemblies journal December 2019
Capturing One of the Human Gut Microbiome’s Most Wanted: Reconstructing the Genome of a Novel Butyrate-Producing, Clostridial Scavenger from Metagenomic Sequence Data journal May 2016
Deciphering the Human Virome with Single-Virus Genomics and Metagenomics journal March 2018
Erratum: Corrigendum: Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea journal July 2018
Single-virus genomics reveals hidden cosmopolitan and abundant viruses journal June 2017
Draft Genome Sequence of Chloroflexus sp. Strain isl-2, a Thermophilic Filamentous Anoxygenic Phototrophic Bacterium Isolated from the Strokkur Geyser, Iceland journal August 2016
Draft Genome Sequence of a Pseudomonas aeruginosa Strain Able To Decompose N , N -Dimethyl Formamide journal February 2016
Defending Our Public Biological Databases as a Global Critical Infrastructure journal April 2019
Improved Environmental Genomes via Integration of Metagenomic and Single-Cell Assemblies journal February 2016
Benefits of Genomic Insights and CRISPR-Cas Signatures to Monitor Potential Pathogens across Drinking Water Production and Distribution Systems journal October 2017