A machine learning-based service for estimating quality of genomes using PATRIC
- Fellowship for Interpretation of Genomes, Burr Ridge, IL (United States); Univ. of Chicago, Chicago, IL (United States)
- Argonne National Lab. (ANL), Lemont, IL (United States)
- Fellowship for Interpretation of Genomes, Burr Ridge, IL (United States)
- Fellowship for Interpretation of Genomes, Burr Ridge, IL (United States); Argonne National Lab. (ANL), Lemont, IL (United States)
Recent advances in high-volume sequencing technology and mining of genomes from metagenomic samples call for rapid and reliable genome quality evaluation. The current release of the PATRIC database contains over 220,000 genomes, and current metagenomic technology supports assemblies of many draft-quality genomes from a single sample, most of which will be novel. We have added two quality assessment tools to the PATRIC annotation pipeline. EvalCon uses supervised machine learning to calculate an annotation consistency score. EvalG implements a variant of the CheckM algorithm to estimate contamination and completeness of an annotated genome.We report on the performance of these tools and the potential utility of the consistency score. Additionally, we provide contamination, completeness, and consistency measures for all genomes in PATRIC and in a recent set of metagenomic assemblies. EvalG and EvalCon facilitate the rapid quality control and exploration of PATRIC-annotated draft genomes.
- Research Organization:
- Argonne National Laboratory (ANL), Argonne, IL (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC); National Institutes of Health (NIH) - National Institute of Allergy and Infectious Diseases (NIAID)
- Grant/Contract Number:
- AC02-06CH11357; HHSN272201400027C
- OSTI ID:
- 1579345
- Journal Information:
- BMC Bioinformatics, Vol. 20, Issue 1; ISSN 1471-2105
- Publisher:
- BioMed CentralCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Web of Science
The PATRIC Bioinformatics Resource Center: expanding data and analysis capabilities
|
journal | October 2019 |
Supervised extraction of near-complete genomes from metagenomic samples: A new service in PATRIC
|
journal | April 2021 |
Similar Records
PATRIC, the bacterial bioinformatics database and analysis resource
METABOLIC: high-throughput profiling of microbial genomes for functional traits, metabolism, biogeochemistry, and community-scale functional networks