skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: ProDeGe: A computational protocol for fully automated decontamination of genomes

Abstract

Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequences from them are propagating into public databases to drive novel scientific discoveries, rigorous quality controls and decontamination protocols are urgently needed. Here, we present ProDeGe (Protocol for fully automated Decontamination of Genomes), the first computational protocol for fully automated decontamination of draft genomes. ProDeGe classifies sequences into two classes—clean and contaminant—using a combination of homology and feature-based methodologies. On average, 84% of sequence from the non-target organism is removed from the data set (specificity) and 84% of the sequence from the target organism is retained (sensitivity). Lastly, the procedure operates successfully at a rate of ~0.30 CPU core hours per megabase of sequence and can be applied to any type of genome sequence.

Authors:
 [1];  [1];  [1];  [1];  [2];  [1];  [2];  [1];  [1];  [1];  [1]
  1. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
  2. Univ. of North Carolina, Chapel Hill, NC (United States)
Publication Date:
Research Org.:
USDOE Joint Genome Institute (JGI), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1346926
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
The ISME Journal
Additional Journal Information:
Journal Volume: 10; Journal Issue: 1; Journal ID: ISSN 1751-7362
Publisher:
Nature Publishing Group
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING

Citation Formats

Tennessen, Kristin, Andersen, Evan, Clingenpeel, Scott, Rinke, Christian, Lundberg, Derek S., Han, James, Dangl, Jeff L., Ivanova, Natalia, Woyke, Tanja, Kyrpides, Nikos, and Pati, Amrita. ProDeGe: A computational protocol for fully automated decontamination of genomes. United States: N. p., 2015. Web. doi:10.1038/ismej.2015.100.
Tennessen, Kristin, Andersen, Evan, Clingenpeel, Scott, Rinke, Christian, Lundberg, Derek S., Han, James, Dangl, Jeff L., Ivanova, Natalia, Woyke, Tanja, Kyrpides, Nikos, & Pati, Amrita. ProDeGe: A computational protocol for fully automated decontamination of genomes. United States. https://doi.org/10.1038/ismej.2015.100
Tennessen, Kristin, Andersen, Evan, Clingenpeel, Scott, Rinke, Christian, Lundberg, Derek S., Han, James, Dangl, Jeff L., Ivanova, Natalia, Woyke, Tanja, Kyrpides, Nikos, and Pati, Amrita. 2015. "ProDeGe: A computational protocol for fully automated decontamination of genomes". United States. https://doi.org/10.1038/ismej.2015.100. https://www.osti.gov/servlets/purl/1346926.
@article{osti_1346926,
title = {ProDeGe: A computational protocol for fully automated decontamination of genomes},
author = {Tennessen, Kristin and Andersen, Evan and Clingenpeel, Scott and Rinke, Christian and Lundberg, Derek S. and Han, James and Dangl, Jeff L. and Ivanova, Natalia and Woyke, Tanja and Kyrpides, Nikos and Pati, Amrita},
abstractNote = {Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequences from them are propagating into public databases to drive novel scientific discoveries, rigorous quality controls and decontamination protocols are urgently needed. Here, we present ProDeGe (Protocol for fully automated Decontamination of Genomes), the first computational protocol for fully automated decontamination of draft genomes. ProDeGe classifies sequences into two classes—clean and contaminant—using a combination of homology and feature-based methodologies. On average, 84% of sequence from the non-target organism is removed from the data set (specificity) and 84% of the sequence from the target organism is retained (sensitivity). Lastly, the procedure operates successfully at a rate of ~0.30 CPU core hours per megabase of sequence and can be applied to any type of genome sequence.},
doi = {10.1038/ismej.2015.100},
url = {https://www.osti.gov/biblio/1346926}, journal = {The ISME Journal},
issn = {1751-7362},
number = 1,
volume = 10,
place = {United States},
year = {Tue Jun 09 00:00:00 EDT 2015},
month = {Tue Jun 09 00:00:00 EDT 2015}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 40 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
journal, May 2012


BLAST+: architecture and applications
journal, January 2009


Targeted metagenomics and ecology of globally important uncultured eukaryotic phytoplankton
journal, July 2010


Hidden Diversity in Honey Bee Gut Symbionts Detected by Single-Cell Genomics
journal, September 2014


Genomic insights into the uncultivated marine Zetaproteobacteria at Loihi Seamount
journal, October 2014


SmashCell: a software framework for the analysis of single-cell amplified genome sequences: Fig. 1.
journal, October 2010


Prodigal: prokaryotic gene recognition and translation initiation site identification
journal, March 2010


Single-cell genomics
journal, March 2011


IMG 4 version of the integrated microbial genomes comparative analysis system
journal, October 2013


Large-scale contamination of microbial isolate genomes by Illumina PhiX control
journal, March 2015


Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes
journal, July 2014


Insights into the phylogeny and coding potential of microbial dark matter
journal, July 2013


Genomes from Metagenomics
journal, November 2013


Prevalent genome streamlining and latitudinal divergence of planktonic bacteria in the surface ocean
journal, June 2013


Decontamination of MDA Reagents for Single Cell Whole Genome Amplification
journal, October 2011


Genomic insights into the uncultivated marine Zetaproteobacteria at Loihi Seamount
journal, October 2014


Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes
journal, July 2014


Single-cell genomics
journal, March 2011


Targeted metagenomics and ecology of globally important uncultured eukaryotic phytoplankton
journal, July 2010


Prevalent genome streamlining and latitudinal divergence of planktonic bacteria in the surface ocean
journal, June 2013


IMG 4 version of the integrated microbial genomes comparative analysis system
journal, October 2013


Genomes from Metagenomics
journal, November 2013


BLAST+: architecture and applications
journal, January 2009


Prodigal: prokaryotic gene recognition and translation initiation site identification
journal, March 2010


Large-scale contamination of microbial isolate genomes by Illumina PhiX control
journal, March 2015


Hidden Diversity in Honey Bee Gut Symbionts Detected by Single-Cell Genomics
journal, September 2014


Decontamination of MDA Reagents for Single Cell Whole Genome Amplification
journal, October 2011


Works referencing / citing this record:

Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea
journal, August 2017


Single-cell genome sequencing: current state of the science
journal, January 2016


Niche differentiation is spatially and temporally regulated in the rhizosphere
journal, January 2020


Massively parallel whole genome amplification for single-cell sequencing using droplet microfluidics
journal, July 2017


Obtaining high-quality draft genomes from uncultured microbes by cleaning and co-assembly of single-cell amplified genomes
journal, February 2018


Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements
journal, October 2016


ContEst16S: an algorithm that identifies contaminated prokaryotic genomes using 16S RNA gene sequences
journal, June 2017


Enrichment of Root Endophytic Bacteria from Populus deltoides and Single-Cell-Genomics Analysis
journal, July 2016


acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data
journal, December 2016


BlobTools: Interrogation of genome assemblies
journal, January 2017


Consensus assessment of the contamination level of publicly available cyanobacterial genomes
journal, July 2018


Prevalence and Implications of Contamination in Public Genomic Resources: A Case Study of 43 Reference Arthropod Assemblies
journal, December 2019


Deciphering the Human Virome with Single-Virus Genomics and Metagenomics
journal, March 2018


Erratum: Corrigendum: Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea
journal, July 2018


Single-virus genomics reveals hidden cosmopolitan and abundant viruses
journal, June 2017


Massively parallel whole genome amplification for single-cell sequencing using droplet microfluidics
journal, July 2017


Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements
journal, October 2016


Enrichment of Root Endophytic Bacteria from Populus deltoides and Single-Cell-Genomics Analysis
journal, July 2016


Draft Genome Sequence of a Pseudomonas aeruginosa Strain Able To Decompose N , N -Dimethyl Formamide
journal, February 2016


acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data
journal, December 2016


Prevalence and Implications of Contamination in Public Genomic Resources: A Case Study of 43 Reference Arthropod Assemblies
journal, December 2019


Defending Our Public Biological Databases as a Global Critical Infrastructure
journal, April 2019


Improved Environmental Genomes via Integration of Metagenomic and Single-Cell Assemblies
journal, February 2016