skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Toward a standard in structural genome annotation for prokaryotes

Journal Article · · Standards in Genomic Sciences
 [1];  [2];  [3];  [4];  [1];  [1];  [1];  [5];  [1];  [1]
  1. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
  2. J. Craig Venter Inst., Rockville, MD (United States)
  3. Univ. of Maryland School of Medicine, Baltimore, MD (United States)
  4. Broad Inst., Cambridge, MA (United States)
  5. Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

In an effort to identify the best practice for finding genes in prokaryotic genomes and propose it as a standard for automated annotation pipelines, we collected 1,004,576 peptides from various publicly available resources, and these were used as a basis to evaluate various gene-calling methods. The peptides came from 45 bacterial replicons with an average GC content from 31 % to 74 %, biased toward higher GC content genomes. Automated, manual, and semi-manual methods were used to tally errors in three widely used gene calling methods, as evidenced by peptides mapped outside the boundaries of called genes. We found that the consensus set of identical genes predicted by the three methods constitutes only about 70 % of the genes predicted by each individual method (with start and stop required to coincide). Peptide data was useful for evaluating some of the differences between gene callers, but not reliable enough to make the results conclusive, due to limitations inherent in any proteogenomic study. A single, unambiguous, unanimous best practice did not emerge from this analysis, since the available proteomics data were not adequate to provide an objective measurement of differences in the accuracy between these methods. However, as a result of this study, software, reference data, and procedures have been better matched among participants, representing a step toward a much-needed standard. In the absence of sufficient amount of experimental data to achieve a universal standard, our recommendation is that any of these methods can be used by the community, as long as a single method is employed across all datasets to be compared.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1260799
Journal Information:
Standards in Genomic Sciences, Vol. 10, Issue 1; ISSN 1944-3277
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 8 works
Citation information provided by
Web of Science

References (17)

Genome analysis and genome-wide proteomics of Thermococcus gammatolerans, the most radioresistant organism known amongst the Archaea journal January 2009
Validating divergent ORF annotation of the Mycobacterium leprae genome through a full translation data set and peptide identification by tandem mass spectrometry journal June 2009
Proteomics-based Refinement of Deinococcus deserti Genome Annotation Reveals an Unwonted Use of Non-canonical Translation Initiation Codons journal October 2009
Identifying bacterial genes and endosymbiont DNA with Glimmer journal January 2007
The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification journal October 2014
Improving gene annotation using peptide mass spectrometry journal January 2007
Fifteen years of microbial genomics: meeting the challenges and fulfilling the dream journal July 2009
Proteogenomics to discover the full coding content of genomes: A computational perspective journal October 2010
RefSeq microbial genomes database: new representation and annotation strategy journal December 2013
GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes journal May 2010
Proteogenomic Analysis of Bacteria and Archaea: A 46 Organism Case Study journal November 2011
Gene Identification in Prokaryotic Genomes, Phages, Metagenomes, and EST Sequences with GeneMarkS Suite journal February 2014
Ortho-proteogenomics: Multiple proteomes investigation through orthology and a new MS-based protocol journal October 2008
Expanding the Known Repertoire of Virulence Factors Produced by Bacillus cereus through Early Secretome Profiling in Three Redox Conditions journal April 2010
Prodigal: prokaryotic gene recognition and translation initiation site identification journal March 2010
Heterocyst Pattern Formation Controlled by a Diffusible Peptide journal October 1998
The Proteomics Identifications (PRIDE) database and associated tools: status in 2013 journal November 2012

Cited By (7)

AnnoTree: visualization and exploration of a functionally annotated microbial tree of life journal April 2019
Occurrence and expression of genes encoding methyl-compound production in rumen bacteria journal November 2019
Methods, Tools and Current Perspectives in Proteogenomics journal April 2017
Comparative Genomics of Rumen Butyrivibrio spp. Uncovers a Continuum of Polysaccharide-Degrading Capabilities journal October 2019
1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life journal June 2017
Cultivation and sequencing of rumen microbiome members from the Hungate1000 Collection journal March 2018
Persistence of Functional Protein Domains in Mycoplasma Species and their Role in Host Specificity and Synthetic Minimal Life journal February 2017

Similar Records

Comparative Omics-Driven Genome Annotation Refinement: Application across Yersiniae
Journal Article · Tue Mar 27 00:00:00 EDT 2012 · PLoS One · OSTI ID:1260799

Proteogenomic analysis of bacteria and archaea: A 46 organism case study
Journal Article · Thu Nov 17 00:00:00 EST 2011 · PLoS One · OSTI ID:1260799

Prodigal: prokaryotic gene recognition and translation initiation site identification
Journal Article · Fri Jan 01 00:00:00 EST 2010 · BMC Bioinformatics · OSTI ID:1260799

Related Subjects