skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Quality scores for 32,000 genomes

Journal Article · · Standards in Genomic Sciences
 [1];  [2];  [1];  [3];  [4];  [5];  [6]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Biosciences Division. Comparative Genomics Group
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Biosciences Division. Comparative Genomics Group; Univ. of Tennessee, Knoxville, TN (United States). Joint Inst. for Biological Sciences
  3. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Computer Science and Mathematics Division
  4. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Biosciences Division. Comparative Genomics Group; Univ. of Tennessee, Knoxville, TN (United States). Joint Inst. for Biological Sciences; Univ. of Tennessee, Knoxville, TN (United States). Dept. of Microbiology
  5. Technical Univ. of Denmark, Lyngby (Denmark). Center for Genomic Epidemiology
  6. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Biosciences Division. Comparative Genomics Group; Univ. of Tennessee, Knoxville, TN (United States). Joint Inst. for Biological Sciences; Technical Univ. of Denmark, Lyngby (Denmark). Dept. of Systems Biology. Center for Biological Sequence Analysis

More than 80% of the microbial genomes in GenBank are of ‘draft’ quality (12,553 draft vs. 2,679 finished, as of October, 2013). In this study, we have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than 30,000 prokaryotic genome sequences. Scores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 102 conserved genes in prokaryotes. Most (~88%) of the genomes had quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis. We compared genomes across factors that may influence the score. We found that although sequencing depth coverage of over 100x did not ensure a better score, sequencing read length was a better indicator of sequencing quality. With few exceptions, most of the 30,000 genomes have nearly all the 102 essential genes. The score can be used to set thresholds for screening data when analyzing “all published genomes” and reference data is either not available or not applicable. The scores highlighted organisms for which commonly used tools do not perform well. This information can be used to improve tools and to serve a broad group of users as more diverse organisms are sequenced. Finally and unexpectedly, the comparison of predicted tRNAs across 15,000 high quality genomes showed that anticodons beginning with an ‘A’ (codons ending with a ‘U’) are almost non-existent, with the exception of one arginine codon (CGU); this has been noted previously in the literature for a few genomes, but not with the depth found here.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Laboratory Directed Research and Development (LDRD)
Grant/Contract Number:
AC05-00OR22725; PS02-06ER64304
OSTI ID:
1185423
Journal Information:
Standards in Genomic Sciences, Vol. 9; ISSN 1944-3277
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 30 works
Citation information provided by
Web of Science

References (21)

A Semantic Web Management Model for Integrative Biomedical Informatics journal August 2008
TOLKIN – Tree of Life Knowledge and Information Network: Filling a Gap for Collaborative Research in Biological Systematics journal June 2012
Recent Directions in Compressing Next Generation Sequencing Data journal March 2012
The Fast Changing Landscape of Sequencing Technologies and Their Impact on Microbial Genome Assemblies and Annotation journal December 2012
The Value of Complete Microbial Genome Sequencing (You Get What You Pay For) journal December 2002
Genome Project Standards in a New Era of Sequencing journal October 2009
GenBank journal November 2012
Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria journal January 2012
Pfam: the protein families database journal November 2013
tRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic Sequence journal March 1997
RNAmmer: consistent and rapid annotation of ribosomal RNA genes journal April 2007
Prodigal: prokaryotic gene recognition and translation initiation site identification journal March 2010
HMMER web server: interactive sequence similarity searching journal May 2011
GtRNAdb: a database of transfer RNA genes detected in genomic sequence journal January 2009
The advantages of SMRT sequencing journal June 2013
Spatiotemporal persistence of multiple, diverse clades and toxins of Corynebacterium diphtheriae journal March 2021
The advantages of SMRT sequencing journal July 2013
tRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic Sequence journal March 1997
The DNA data deluge journal July 2013
PATRIC: the Comprehensive Bacterial Bioinformatics Resource with a Focus on Human Pathogenic Species journal September 2011
Genome Sequence of Thermofilum pendens Reveals an Exceptional Loss of Biosynthetic Pathways without Genome Reduction journal February 2008

Cited By (19)

Bioprospecting Archaea: Focus on Extreme Halophiles book December 2016
Advancements in Microbial Genome Sequencing and Microbial Community Characterization book January 2019
Insights from 20 years of bacterial genome sequencing journal February 2015
FDA-ARGOS is a database with public quality-controlled reference genomes for diagnostic use and regulatory science journal July 2019
Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea journal December 2019
Microbiome analyses of blood and tissues suggest cancer diagnostic approach journal March 2020
Population genomic datasets describing the post-vaccine evolutionary epidemiology of Streptococcus pneumoniae journal October 2015
Genomic characterization of Nontuberculous Mycobacteria journal March 2017
Genome Evolution of Bartonellaceae Symbionts of Ants at the Opposite Ends of the Trophic Scale journal July 2018
What can we learn from over 100,000 Escherichia coli genomes? journal January 2020
Assessment of genome annotation using gene function similarity within the gene neighborhood journal July 2017
Impact of the choice of reference genome on the ability of the core genome SNV methodology to distinguish strains of Salmonella enterica serovar Heidelberg journal February 2018
Pan4Draft: A Computational Tool to Improve the Accuracy of Pan-Genomic Analysis Using Draft Genomes journal June 2018
The landscape of microbial phenotypic traits and associated genes journal October 2016
dBBQs : dataBase of Bacterial Quality scores journal September 2017
Analysis of Draft Genome Sequence of Pseudomonas sp. QTF5 Reveals Its Benzoic Acid Degradation Ability and Heavy Metal Tolerance journal January 2017
Molecular tools in understanding the evolution of Vibrio cholerae journal October 2015
Arcobacter cryaerophilus Isolated From New Zealand Mussels Harbor a Putative Virulence Plasmid journal August 2019
Quality Assessment of Domesticated Animal Genome Assemblies journal January 2015