Toward a standard in structural genome annotation for prokaryotes
- USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
- J. Craig Venter Inst., Rockville, MD (United States)
- Univ. of Maryland School of Medicine, Baltimore, MD (United States)
- Broad Inst., Cambridge, MA (United States)
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
In an effort to identify the best practice for finding genes in prokaryotic genomes and propose it as a standard for automated annotation pipelines, we collected 1,004,576 peptides from various publicly available resources, and these were used as a basis to evaluate various gene-calling methods. The peptides came from 45 bacterial replicons with an average GC content from 31 % to 74 %, biased toward higher GC content genomes. Automated, manual, and semi-manual methods were used to tally errors in three widely used gene calling methods, as evidenced by peptides mapped outside the boundaries of called genes. We found that the consensus set of identical genes predicted by the three methods constitutes only about 70 % of the genes predicted by each individual method (with start and stop required to coincide). Peptide data was useful for evaluating some of the differences between gene callers, but not reliable enough to make the results conclusive, due to limitations inherent in any proteogenomic study. A single, unambiguous, unanimous best practice did not emerge from this analysis, since the available proteomics data were not adequate to provide an objective measurement of differences in the accuracy between these methods. However, as a result of this study, software, reference data, and procedures have been better matched among participants, representing a step toward a much-needed standard. In the absence of sufficient amount of experimental data to achieve a universal standard, our recommendation is that any of these methods can be used by the community, as long as a single method is employed across all datasets to be compared.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Biological and Environmental Research (BER)
- Grant/Contract Number:
- AC02-05CH11231
- OSTI ID:
- 1260799
- Journal Information:
- Standards in Genomic Sciences, Vol. 10, Issue 1; ISSN 1944-3277
- Publisher:
- BioMed CentralCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Web of Science
Similar Records
Proteogenomic analysis of bacteria and archaea: A 46 organism case study
Prodigal: prokaryotic gene recognition and translation initiation site identification