DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A machine learning-based service for estimating quality of genomes using PATRIC

Journal Article · · BMC Bioinformatics
 [1];  [2]; ORCiD logo [3];  [2];  [4];  [3];  [3];  [1]
  1. Fellowship for Interpretation of Genomes, Burr Ridge, IL (United States); Univ. of Chicago, Chicago, IL (United States)
  2. Argonne National Lab. (ANL), Lemont, IL (United States)
  3. Fellowship for Interpretation of Genomes, Burr Ridge, IL (United States)
  4. Fellowship for Interpretation of Genomes, Burr Ridge, IL (United States); Argonne National Lab. (ANL), Lemont, IL (United States)

Recent advances in high-volume sequencing technology and mining of genomes from metagenomic samples call for rapid and reliable genome quality evaluation. The current release of the PATRIC database contains over 220,000 genomes, and current metagenomic technology supports assemblies of many draft-quality genomes from a single sample, most of which will be novel. We have added two quality assessment tools to the PATRIC annotation pipeline. EvalCon uses supervised machine learning to calculate an annotation consistency score. EvalG implements a variant of the CheckM algorithm to estimate contamination and completeness of an annotated genome.We report on the performance of these tools and the potential utility of the consistency score. Additionally, we provide contamination, completeness, and consistency measures for all genomes in PATRIC and in a recent set of metagenomic assemblies. EvalG and EvalCon facilitate the rapid quality control and exploration of PATRIC-annotated draft genomes.

Research Organization:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE Office of Science (SC); National Institutes of Health (NIH) - National Institute of Allergy and Infectious Diseases (NIAID)
Grant/Contract Number:
AC02-06CH11357; HHSN272201400027C
OSTI ID:
1579345
Journal Information:
BMC Bioinformatics, Vol. 20, Issue 1; ISSN 1471-2105
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 19 works
Citation information provided by
Web of Science

References (20)

Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center journal November 2016
PATRIC: The VBI PathoSystems Resource Integration Center journal January 2007
Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle journal January 2019
Anvi’o: an advanced analysis and visualization platform for ‘omics data journal January 2015
BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs journal June 2015
CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes journal May 2015
MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities journal January 2015
RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes journal February 2015
The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes journal September 2005
The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST) journal November 2013
Salinimonas marina sp. nov. Isolated from Jeju Island Marine Sediment journal June 2021
Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle journal January 2019
RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes journal February 2015
BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs journal June 2015
The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes journal September 2005
PATRIC: The VBI PathoSystems Resource Integration Center journal January 2007
The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST) journal November 2013
Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center journal November 2016
MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities journal January 2015
Anvi’o: an advanced analysis and visualization platform for ‘omics data journal January 2015

Cited By (2)

The PATRIC Bioinformatics Resource Center: expanding data and analysis capabilities journal October 2019
Supervised extraction of near-complete genomes from metagenomic samples: A new service in PATRIC journal April 2021