skip to main content


Title: Toward a standard in structural genome annotation for prokaryotes

In an effort to identify the best practice for finding genes in prokaryotic genomes and propose it as a standard for automated annotation pipelines, we collected 1,004,576 peptides from various publicly available resources, and these were used as a basis to evaluate various gene-calling methods. The peptides came from 45 bacterial replicons with an average GC content from 31 % to 74 %, biased toward higher GC content genomes. Automated, manual, and semi-manual methods were used to tally errors in three widely used gene calling methods, as evidenced by peptides mapped outside the boundaries of called genes. We found that the consensus set of identical genes predicted by the three methods constitutes only about 70 % of the genes predicted by each individual method (with start and stop required to coincide). Peptide data was useful for evaluating some of the differences between gene callers, but not reliable enough to make the results conclusive, due to limitations inherent in any proteogenomic study. A single, unambiguous, unanimous best practice did not emerge from this analysis, since the available proteomics data were not adequate to provide an objective measurement of differences in the accuracy between these methods. However, as a result of thismore » study, software, reference data, and procedures have been better matched among participants, representing a step toward a much-needed standard. In the absence of sufficient amount of experimental data to achieve a universal standard, our recommendation is that any of these methods can be used by the community, as long as a single method is employed across all datasets to be compared.« less
 [1] ;  [2] ;  [3] ;  [4] ;  [1] ;  [1] ;  [1] ;  [5] ;  [1] ;  [1]
  1. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
  2. J. Craig Venter Inst., Rockville, MD (United States)
  3. Univ. of Maryland School of Medicine, Baltimore, MD (United States)
  4. Broad Inst., Cambridge, MA (United States)
  5. Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Publication Date:
Grant/Contract Number:
Accepted Manuscript
Journal Name:
Standards in Genomic Sciences
Additional Journal Information:
Journal Volume: 10; Journal Issue: 1; Journal ID: ISSN 1944-3277
BioMed Central
Research Org:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23)
Country of Publication:
United States
OSTI Identifier: