ProDeGe: A computational protocol for fully automated decontamination of genomes
Abstract
Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequences from them are propagating into public databases to drive novel scientific discoveries, rigorous quality controls and decontamination protocols are urgently needed. Here, we present ProDeGe (Protocol for fully automated Decontamination of Genomes), the first computational protocol for fully automated decontamination of draft genomes. ProDeGe classifies sequences into two classes—clean and contaminant—using a combination of homology and feature-based methodologies. On average, 84% of sequence from the non-target organism is removed from the data set (specificity) and 84% of the sequence from the target organism is retained (sensitivity). Lastly, the procedure operates successfully at a rate of ~0.30 CPU core hours per megabase of sequence and can be applied to any type of genome sequence.
- Authors:
-
- USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
- Univ. of North Carolina, Chapel Hill, NC (United States)
- Publication Date:
- Research Org.:
- USDOE Joint Genome Institute (JGI), Berkeley, CA (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC)
- OSTI Identifier:
- 1346926
- Grant/Contract Number:
- AC02-05CH11231
- Resource Type:
- Journal Article: Accepted Manuscript
- Journal Name:
- The ISME Journal
- Additional Journal Information:
- Journal Volume: 10; Journal Issue: 1; Journal ID: ISSN 1751-7362
- Publisher:
- Nature Publishing Group
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING
Citation Formats
Tennessen, Kristin, Andersen, Evan, Clingenpeel, Scott, Rinke, Christian, Lundberg, Derek S., Han, James, Dangl, Jeff L., Ivanova, Natalia, Woyke, Tanja, Kyrpides, Nikos, and Pati, Amrita. ProDeGe: A computational protocol for fully automated decontamination of genomes. United States: N. p., 2015.
Web. doi:10.1038/ismej.2015.100.
Tennessen, Kristin, Andersen, Evan, Clingenpeel, Scott, Rinke, Christian, Lundberg, Derek S., Han, James, Dangl, Jeff L., Ivanova, Natalia, Woyke, Tanja, Kyrpides, Nikos, & Pati, Amrita. ProDeGe: A computational protocol for fully automated decontamination of genomes. United States. https://doi.org/10.1038/ismej.2015.100
Tennessen, Kristin, Andersen, Evan, Clingenpeel, Scott, Rinke, Christian, Lundberg, Derek S., Han, James, Dangl, Jeff L., Ivanova, Natalia, Woyke, Tanja, Kyrpides, Nikos, and Pati, Amrita. 2015.
"ProDeGe: A computational protocol for fully automated decontamination of genomes". United States. https://doi.org/10.1038/ismej.2015.100. https://www.osti.gov/servlets/purl/1346926.
@article{osti_1346926,
title = {ProDeGe: A computational protocol for fully automated decontamination of genomes},
author = {Tennessen, Kristin and Andersen, Evan and Clingenpeel, Scott and Rinke, Christian and Lundberg, Derek S. and Han, James and Dangl, Jeff L. and Ivanova, Natalia and Woyke, Tanja and Kyrpides, Nikos and Pati, Amrita},
abstractNote = {Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequences from them are propagating into public databases to drive novel scientific discoveries, rigorous quality controls and decontamination protocols are urgently needed. Here, we present ProDeGe (Protocol for fully automated Decontamination of Genomes), the first computational protocol for fully automated decontamination of draft genomes. ProDeGe classifies sequences into two classes—clean and contaminant—using a combination of homology and feature-based methodologies. On average, 84% of sequence from the non-target organism is removed from the data set (specificity) and 84% of the sequence from the target organism is retained (sensitivity). Lastly, the procedure operates successfully at a rate of ~0.30 CPU core hours per megabase of sequence and can be applied to any type of genome sequence.},
doi = {10.1038/ismej.2015.100},
url = {https://www.osti.gov/biblio/1346926},
journal = {The ISME Journal},
issn = {1751-7362},
number = 1,
volume = 10,
place = {United States},
year = {Tue Jun 09 00:00:00 EDT 2015},
month = {Tue Jun 09 00:00:00 EDT 2015}
}
Web of Science
Works referenced in this record:
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
journal, May 2012
- Bankevich, Anton; Nurk, Sergey; Antipov, Dmitry
- Journal of Computational Biology, Vol. 19, Issue 5
BLAST+: architecture and applications
journal, January 2009
- Camacho, Christiam; Coulouris, George; Avagyan, Vahram
- BMC Bioinformatics, Vol. 10, Issue 1
Targeted metagenomics and ecology of globally important uncultured eukaryotic phytoplankton
journal, July 2010
- Cuvelier, M. L.; Allen, A. E.; Monier, A.
- Proceedings of the National Academy of Sciences, Vol. 107, Issue 33
Hidden Diversity in Honey Bee Gut Symbionts Detected by Single-Cell Genomics
journal, September 2014
- Engel, Philipp; Stepanauskas, Ramunas; Moran, Nancy A.
- PLoS Genetics, Vol. 10, Issue 9
Genomic insights into the uncultivated marine Zetaproteobacteria at Loihi Seamount
journal, October 2014
- Field, Erin K.; Sczyrba, Alexander; Lyman, Audrey E.
- The ISME Journal, Vol. 9, Issue 4
SmashCell: a software framework for the analysis of single-cell amplified genome sequences: Fig. 1.
journal, October 2010
- Harrington, Eoghan D.; Arumugam, Manimozhiyan; Raes, Jeroen
- Bioinformatics, Vol. 26, Issue 23
Prodigal: prokaryotic gene recognition and translation initiation site identification
journal, March 2010
- Hyatt, Doug; Chen, Gwo-Liang; LoCascio, Philip F.
- BMC Bioinformatics, Vol. 11, Issue 1
Single-cell genomics
journal, March 2011
- Kalisky, Tomer; Quake, Stephen R.
- Nature Methods, Vol. 8, Issue 4
IMG 4 version of the integrated microbial genomes comparative analysis system
journal, October 2013
- Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna
- Nucleic Acids Research, Vol. 42, Issue D1
Large-scale contamination of microbial isolate genomes by Illumina PhiX control
journal, March 2015
- Mukherjee, Supratim; Huntemann, Marcel; Ivanova, Natalia
- Standards in Genomic Sciences, Vol. 10, Issue 1
Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes
journal, July 2014
- Nielsen, H. Bjørn; Almeida, Mathieu; Juncker, Agnieszka Sierakowska
- Nature Biotechnology, Vol. 32, Issue 8
Insights into the phylogeny and coding potential of microbial dark matter
journal, July 2013
- Rinke, Christian; Schwientek, Patrick; Sczyrba, Alexander
- Nature, Vol. 499, Issue 7459
Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets
journal, March 2011
- Schmieder, Robert; Edwards, Robert
- PLoS ONE, Vol. 6, Issue 3
Genomes from Metagenomics
journal, November 2013
- Sharon, I.; Banfield, J. F.
- Science, Vol. 342, Issue 6162
Prevalent genome streamlining and latitudinal divergence of planktonic bacteria in the surface ocean
journal, June 2013
- Swan, B. K.; Tupper, B.; Sczyrba, A.
- Proceedings of the National Academy of Sciences, Vol. 110, Issue 28
Decontamination of MDA Reagents for Single Cell Whole Genome Amplification
journal, October 2011
- Woyke, Tanja; Sczyrba, Alexander; Lee, Janey
- PLoS ONE, Vol. 6, Issue 10
Genomic insights into the uncultivated marine Zetaproteobacteria at Loihi Seamount
journal, October 2014
- Field, Erin K.; Sczyrba, Alexander; Lyman, Audrey E.
- The ISME Journal, Vol. 9, Issue 4
Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes
journal, July 2014
- Nielsen, H. Bjørn; Almeida, Mathieu; Juncker, Agnieszka Sierakowska
- Nature Biotechnology, Vol. 32, Issue 8
Single-cell genomics
journal, March 2011
- Kalisky, Tomer; Quake, Stephen R.
- Nature Methods, Vol. 8, Issue 4
Targeted metagenomics and ecology of globally important uncultured eukaryotic phytoplankton
journal, July 2010
- Cuvelier, M. L.; Allen, A. E.; Monier, A.
- Proceedings of the National Academy of Sciences, Vol. 107, Issue 33
Prevalent genome streamlining and latitudinal divergence of planktonic bacteria in the surface ocean
journal, June 2013
- Swan, B. K.; Tupper, B.; Sczyrba, A.
- Proceedings of the National Academy of Sciences, Vol. 110, Issue 28
IMG 4 version of the integrated microbial genomes comparative analysis system
journal, October 2013
- Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna
- Nucleic Acids Research, Vol. 42, Issue D1
Genomes from Metagenomics
journal, November 2013
- Sharon, I.; Banfield, J. F.
- Science, Vol. 342, Issue 6162
BLAST+: architecture and applications
journal, January 2009
- Camacho, Christiam; Coulouris, George; Avagyan, Vahram
- BMC Bioinformatics, Vol. 10, Issue 1
Prodigal: prokaryotic gene recognition and translation initiation site identification
journal, March 2010
- Hyatt, Doug; Chen, Gwo-Liang; LoCascio, Philip F.
- BMC Bioinformatics, Vol. 11, Issue 1
Large-scale contamination of microbial isolate genomes by Illumina PhiX control
journal, March 2015
- Mukherjee, Supratim; Huntemann, Marcel; Ivanova, Natalia
- Standards in Genomic Sciences, Vol. 10, Issue 1
Hidden Diversity in Honey Bee Gut Symbionts Detected by Single-Cell Genomics
journal, September 2014
- Engel, Philipp; Stepanauskas, Ramunas; Moran, Nancy A.
- PLoS Genetics, Vol. 10, Issue 9
Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets
journal, March 2011
- Schmieder, Robert; Edwards, Robert
- PLoS ONE, Vol. 6, Issue 3
Decontamination of MDA Reagents for Single Cell Whole Genome Amplification
journal, October 2011
- Woyke, Tanja; Sczyrba, Alexander; Lee, Janey
- PLoS ONE, Vol. 6, Issue 10
Works referencing / citing this record:
Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea
journal, August 2017
- Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas
- Nature Biotechnology, Vol. 35, Issue 8
Single-cell genome sequencing: current state of the science
journal, January 2016
- Gawad, Charles; Koh, Winston; Quake, Stephen R.
- Nature Reviews Genetics, Vol. 17, Issue 3
Niche differentiation is spatially and temporally regulated in the rhizosphere
journal, January 2020
- Nuccio, Erin E.; Starr, Evan; Karaoz, Ulas
- The ISME Journal, Vol. 14, Issue 4
Massively parallel whole genome amplification for single-cell sequencing using droplet microfluidics
journal, July 2017
- Hosokawa, Masahito; Nishikawa, Yohei; Kogawa, Masato
- Scientific Reports, Vol. 7, Issue 1
Obtaining high-quality draft genomes from uncultured microbes by cleaning and co-assembly of single-cell amplified genomes
journal, February 2018
- Kogawa, Masato; Hosokawa, Masahito; Nishikawa, Yohei
- Scientific Reports, Vol. 8, Issue 1
Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements
journal, October 2016
- Mukherjee, Supratim; Stamatis, Dimitri; Bertsch, Jon
- Nucleic Acids Research, Vol. 45, Issue D1
ContEst16S: an algorithm that identifies contaminated prokaryotic genomes using 16S RNA gene sequences
journal, June 2017
- Lee, Imchang; Chalita, Mauricio; Ha, Sung-Min
- International Journal of Systematic and Evolutionary Microbiology, Vol. 67, Issue 6
Genomic evidence for distinct carbon substrate preferences and ecological niches of Bathyarchaeota in estuarine sediments: Genomic content of uncultured benthic Bathyarchaeota
journal, January 2016
- Lazar, Cassandre Sara; Baker, Brett J.; Seitz, Kiley
- Environmental Microbiology, Vol. 18, Issue 4
Enrichment of Root Endophytic Bacteria from Populus deltoides and Single-Cell-Genomics Analysis
journal, July 2016
- Utturkar, Sagar M.; Cude, W. Nathan; Robeson, Michael S.
- Applied and Environmental Microbiology, Vol. 82, Issue 18
Draft Genome Sequence of Aeribacillus pallidus Strain 8m3, a Thermophilic Hydrocarbon-Oxidizing Bacterium Isolated from the Dagang Oil Field (China)
journal, June 2016
- Poltaraus, Andrey B.; Sokolova, Diyana S.; Grouzdev, Denis S.
- Genome Announcements, Vol. 4, Issue 3
acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data
journal, December 2016
- Lux, Markus; Krüger, Jan; Rinke, Christian
- BMC Bioinformatics, Vol. 17, Issue 1
Within-species contamination of bacterial whole-genome sequence data has a greater influence on clustering analyses than between-species contamination
journal, December 2019
- Pightling, Arthur W.; Pettengill, James B.; Wang, Yu
- Genome Biology, Vol. 20, Issue 1
BlobTools: Interrogation of genome assemblies
journal, January 2017
- Laetsch, Dominik R.; Blaxter, Mark L.
- F1000Research, Vol. 6
Consensus assessment of the contamination level of publicly available cyanobacterial genomes
journal, July 2018
- Cornet, Luc; Meunier, Loïc; Van Vlierberghe, Mick
- PLOS ONE, Vol. 13, Issue 7
Prevalence and Implications of Contamination in Public Genomic Resources: A Case Study of 43 Reference Arthropod Assemblies
journal, December 2019
- Francois, Clementine M.; Durand, Faustine; Figuet, Emeric
- G3: Genes|Genomes|Genetics, Vol. 10, Issue 2
Capturing One of the Human Gut Microbiome’s Most Wanted: Reconstructing the Genome of a Novel Butyrate-Producing, Clostridial Scavenger from Metagenomic Sequence Data
journal, May 2016
- Jeraldo, Patricio; Hernandez, Alvaro; Nielsen, Henrik B.
- Frontiers in Microbiology, Vol. 7
Deciphering the Human Virome with Single-Virus Genomics and Metagenomics
journal, March 2018
- de la Cruz Peña, Maria; Martinez-Hernandez, Francisco; Garcia-Heredia, Inmaculada
- Viruses, Vol. 10, Issue 3
Erratum: Corrigendum: Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea
journal, July 2018
- Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas
- Nature Biotechnology, Vol. 36, Issue 7
Single-virus genomics reveals hidden cosmopolitan and abundant viruses
journal, June 2017
- Martinez-Hernandez, Francisco; Fornas, Oscar; Lluesma Gomez, Monica
- Nature Communications, Vol. 8, Issue 1
Massively parallel whole genome amplification for single-cell sequencing using droplet microfluidics
journal, July 2017
- Hosokawa, Masahito; Nishikawa, Yohei; Kogawa, Masato
- Scientific Reports, Vol. 7, Issue 1
Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements
journal, October 2016
- Mukherjee, Supratim; Stamatis, Dimitri; Bertsch, Jon
- Nucleic Acids Research, Vol. 45, Issue D1
Enrichment of Root Endophytic Bacteria from Populus deltoides and Single-Cell-Genomics Analysis
journal, July 2016
- Utturkar, Sagar M.; Cude, W. Nathan; Robeson, Michael S.
- Applied and Environmental Microbiology, Vol. 82, Issue 18
Draft Genome Sequence of Aeribacillus pallidus Strain 8m3, a Thermophilic Hydrocarbon-Oxidizing Bacterium Isolated from the Dagang Oil Field (China)
journal, June 2016
- Poltaraus, Andrey B.; Sokolova, Diyana S.; Grouzdev, Denis S.
- Genome Announcements, Vol. 4, Issue 3
Draft Genome Sequence of
Chloroflexus
sp. Strain isl-2, a Thermophilic Filamentous Anoxygenic Phototrophic Bacterium Isolated from the Strokkur Geyser, Iceland
journal, August 2016
- Gaisin, Vasil A.; Ivanov, Timophey M.; Kuznetsov, Boris B.
- Genome Announcements, Vol. 4, Issue 4
Draft Genome Sequence of a Pseudomonas aeruginosa Strain Able To Decompose
N
,
N
-Dimethyl Formamide
journal, February 2016
- Yan, Lingyue; Yan, Ming; Xu, Lin
- Genome Announcements, Vol. 4, Issue 1
acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data
journal, December 2016
- Lux, Markus; Krüger, Jan; Rinke, Christian
- BMC Bioinformatics, Vol. 17, Issue 1
Within-species contamination of bacterial whole-genome sequence data has a greater influence on clustering analyses than between-species contamination
journal, December 2019
- Pightling, Arthur W.; Pettengill, James B.; Wang, Yu
- Genome Biology, Vol. 20, Issue 1
Prevalence and Implications of Contamination in Public Genomic Resources: A Case Study of 43 Reference Arthropod Assemblies
journal, December 2019
- Francois, Clementine M.; Durand, Faustine; Figuet, Emeric
- G3: Genes|Genomes|Genetics, Vol. 10, Issue 2
Defending Our Public Biological Databases as a Global Critical Infrastructure
journal, April 2019
- Caswell, Jacob; Gans, Jason D.; Generous, Nicholas
- Frontiers in Bioengineering and Biotechnology, Vol. 7
Improved Environmental Genomes via Integration of Metagenomic and Single-Cell Assemblies
journal, February 2016
- Mende, Daniel R.; Aylward, Frank O.; Eppley, John M.
- Frontiers in Microbiology, Vol. 7
Capturing One of the Human Gut Microbiome’s Most Wanted: Reconstructing the Genome of a Novel Butyrate-Producing, Clostridial Scavenger from Metagenomic Sequence Data
journal, May 2016
- Jeraldo, Patricio; Hernandez, Alvaro; Nielsen, Henrik B.
- Frontiers in Microbiology, Vol. 7
Benefits of Genomic Insights and CRISPR-Cas Signatures to Monitor Potential Pathogens across Drinking Water Production and Distribution Systems
journal, October 2017
- Zhang, Ya; Kitajima, Masaaki; Whittle, Andrew J.
- Frontiers in Microbiology, Vol. 8