skip to main content

DOE PAGESDOE PAGES

Title: ProDeGe: A computational protocol for fully automated decontamination of genomes

Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequences from them are propagating into public databases to drive novel scientific discoveries, rigorous quality controls and decontamination protocols are urgently needed. Here, we present ProDeGe (Protocol for fully automated Decontamination of Genomes), the first computational protocol for fully automated decontamination of draft genomes. ProDeGe classifies sequences into two classes—clean and contaminant—using a combination of homology and feature-based methodologies. On average, 84% of sequence from the non-target organism is removed from the data set (specificity) and 84% of the sequence from the target organism is retained (sensitivity). Lastly, the procedure operates successfully at a rate of ~0.30 CPU core hours per megabase of sequence and can be applied to any type of genome sequence.
Authors:
 [1] ;  [1] ;  [1] ;  [1] ;  [2] ;  [1] ;  [2] ;  [1] ;  [1] ;  [1] ;  [1]
  1. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
  2. Univ. of North Carolina, Chapel Hill, NC (United States)
Publication Date:
Grant/Contract Number:
AC02-05CH11231
Type:
Accepted Manuscript
Journal Name:
The ISME Journal
Additional Journal Information:
Journal Volume: 10; Journal Issue: 1; Journal ID: ISSN 1751-7362
Publisher:
Nature Publishing Group
Research Org:
Dept. of Energy Joint Genome Inst., Walnut Creek, CA (United States)
Sponsoring Org:
USDOE Office of Science (SC)
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING
OSTI Identifier:
1346926