Toward a standard in structural genome annotation for prokaryotes
Abstract
In an effort to identify the best practice for finding genes in prokaryotic genomes and propose it as a standard for automated annotation pipelines, we collected 1,004,576 peptides from various publicly available resources, and these were used as a basis to evaluate various gene-calling methods. The peptides came from 45 bacterial replicons with an average GC content from 31 % to 74 %, biased toward higher GC content genomes. Automated, manual, and semi-manual methods were used to tally errors in three widely used gene calling methods, as evidenced by peptides mapped outside the boundaries of called genes. We found that the consensus set of identical genes predicted by the three methods constitutes only about 70 % of the genes predicted by each individual method (with start and stop required to coincide). Peptide data was useful for evaluating some of the differences between gene callers, but not reliable enough to make the results conclusive, due to limitations inherent in any proteogenomic study. A single, unambiguous, unanimous best practice did not emerge from this analysis, since the available proteomics data were not adequate to provide an objective measurement of differences in the accuracy between these methods. However, as a result of thismore »
- Authors:
-
- USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
- J. Craig Venter Inst., Rockville, MD (United States)
- Univ. of Maryland School of Medicine, Baltimore, MD (United States)
- Broad Inst., Cambridge, MA (United States)
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Publication Date:
- Research Org.:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Biological and Environmental Research (BER)
- OSTI Identifier:
- 1260799
- Grant/Contract Number:
- AC02-05CH11231
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Standards in Genomic Sciences
- Additional Journal Information:
- Journal Volume: 10; Journal Issue: 1; Journal ID: ISSN 1944-3277
- Publisher:
- BioMed Central
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES
Citation Formats
Tripp, H. James, Sutton, Granger, White, Owen, Wortman, Jennifer, Pati, Amrita, Mikhailova, Natalia, Ovchinnikova, Galina, Payne, Samuel H., Kyrpides, Nikos C., and Ivanova, Natalia. Toward a standard in structural genome annotation for prokaryotes. United States: N. p., 2015.
Web. doi:10.1186/s40793-015-0034-9.
Tripp, H. James, Sutton, Granger, White, Owen, Wortman, Jennifer, Pati, Amrita, Mikhailova, Natalia, Ovchinnikova, Galina, Payne, Samuel H., Kyrpides, Nikos C., & Ivanova, Natalia. Toward a standard in structural genome annotation for prokaryotes. United States. https://doi.org/10.1186/s40793-015-0034-9
Tripp, H. James, Sutton, Granger, White, Owen, Wortman, Jennifer, Pati, Amrita, Mikhailova, Natalia, Ovchinnikova, Galina, Payne, Samuel H., Kyrpides, Nikos C., and Ivanova, Natalia. Sat .
"Toward a standard in structural genome annotation for prokaryotes". United States. https://doi.org/10.1186/s40793-015-0034-9. https://www.osti.gov/servlets/purl/1260799.
@article{osti_1260799,
title = {Toward a standard in structural genome annotation for prokaryotes},
author = {Tripp, H. James and Sutton, Granger and White, Owen and Wortman, Jennifer and Pati, Amrita and Mikhailova, Natalia and Ovchinnikova, Galina and Payne, Samuel H. and Kyrpides, Nikos C. and Ivanova, Natalia},
abstractNote = {In an effort to identify the best practice for finding genes in prokaryotic genomes and propose it as a standard for automated annotation pipelines, we collected 1,004,576 peptides from various publicly available resources, and these were used as a basis to evaluate various gene-calling methods. The peptides came from 45 bacterial replicons with an average GC content from 31 % to 74 %, biased toward higher GC content genomes. Automated, manual, and semi-manual methods were used to tally errors in three widely used gene calling methods, as evidenced by peptides mapped outside the boundaries of called genes. We found that the consensus set of identical genes predicted by the three methods constitutes only about 70 % of the genes predicted by each individual method (with start and stop required to coincide). Peptide data was useful for evaluating some of the differences between gene callers, but not reliable enough to make the results conclusive, due to limitations inherent in any proteogenomic study. A single, unambiguous, unanimous best practice did not emerge from this analysis, since the available proteomics data were not adequate to provide an objective measurement of differences in the accuracy between these methods. However, as a result of this study, software, reference data, and procedures have been better matched among participants, representing a step toward a much-needed standard. In the absence of sufficient amount of experimental data to achieve a universal standard, our recommendation is that any of these methods can be used by the community, as long as a single method is employed across all datasets to be compared.},
doi = {10.1186/s40793-015-0034-9},
journal = {Standards in Genomic Sciences},
number = 1,
volume = 10,
place = {United States},
year = {Sat Jul 25 00:00:00 EDT 2015},
month = {Sat Jul 25 00:00:00 EDT 2015}
}
Web of Science
Works referenced in this record:
Genome analysis and genome-wide proteomics of Thermococcus gammatolerans, the most radioresistant organism known amongst the Archaea
journal, January 2009
- Zivanovic, Yvan; Armengaud, Jean; Lagorce, Arnaud
- Genome Biology, Vol. 10, Issue 6
Validating divergent ORF annotation of the Mycobacterium leprae genome through a full translation data set and peptide identification by tandem mass spectrometry
journal, June 2009
- de Souza, Gustavo A.; Søfteland, Tina; Koehler, Christian J.
- PROTEOMICS, Vol. 9, Issue 12
Genome analysis and genome-wide proteomics of Thermococcus gammatolerans, the most radioresistant organism known amongst the Archaea
journal, January 2009
- Zivanovic, Yvan; Armengaud, Jean; Lagorce, Arnaud
- Genome Biology, Vol. 10, Issue 6
Validating divergent ORF annotation of the Mycobacterium leprae genome through a full translation data set and peptide identification by tandem mass spectrometry
journal, June 2009
- de Souza, Gustavo A.; Søfteland, Tina; Koehler, Christian J.
- PROTEOMICS, Vol. 9, Issue 12
Proteomics-based Refinement of Deinococcus deserti Genome Annotation Reveals an Unwonted Use of Non-canonical Translation Initiation Codons
journal, October 2009
- Baudet, Mathieu; Ortet, Philippe; Gaillard, Jean-Charles
- Molecular & Cellular Proteomics, Vol. 9, Issue 2
Identifying bacterial genes and endosymbiont DNA with Glimmer
journal, January 2007
- Delcher, Arthur L.; Bratke, Kirsten A.; Powers, Edwin C.
- Bioinformatics, Vol. 23, Issue 6
The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification
journal, October 2014
- Reddy, T. B. K.; Thomas, Alex D.; Stamatis, Dimitri
- Nucleic Acids Research, Vol. 43, Issue D1
Improving gene annotation using peptide mass spectrometry
journal, January 2007
- Tanner, S.; Shen, Z.; Ng, J.
- Genome Research, Vol. 17, Issue 2
Fifteen years of microbial genomics: meeting the challenges and fulfilling the dream
journal, July 2009
- Kyrpides, Nikos C.
- Nature Biotechnology, Vol. 27, Issue 7
Proteogenomics to discover the full coding content of genomes: A computational perspective
journal, October 2010
- Castellana, Natalie; Bafna, Vineet
- Journal of Proteomics, Vol. 73, Issue 11
RefSeq microbial genomes database: new representation and annotation strategy
journal, December 2013
- Tatusova, Tatiana; Ciufo, Stacy; Fedorov, Boris
- Nucleic Acids Research, Vol. 42, Issue D1
GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes
journal, May 2010
- Pati, Amrita; Ivanova, Natalia N.; Mikhailova, Natalia
- Nature Methods, Vol. 7, Issue 6
Fifteen years of microbial genomics: meeting the challenges and fulfilling the dream
journal, July 2009
- Kyrpides, Nikos C.
- Nature Biotechnology, Vol. 27, Issue 7
Proteogenomic Analysis of Bacteria and Archaea: A 46 Organism Case Study
journal, November 2011
- Venter, Eli; Smith, Richard D.; Payne, Samuel H.
- PLoS ONE, Vol. 6, Issue 11
Gene Identification in Prokaryotic Genomes, Phages, Metagenomes, and EST Sequences with GeneMarkS Suite
journal, February 2014
- Borodovsky, Mark; Lomsadze, Alex
- Current Protocols in Microbiology, Vol. 32, Issue 1
Ortho-proteogenomics: Multiple proteomes investigation through orthology and a new MS-based protocol
journal, October 2008
- Gallien, S.; Perrodou, E.; Carapito, C.
- Genome Research, Vol. 19, Issue 1
Expanding the Known Repertoire of Virulence Factors Produced by Bacillus cereus through Early Secretome Profiling in Three Redox Conditions
journal, April 2010
- Clair, Gérémy; Roussi, Stamatiki; Armengaud, Jean
- Molecular & Cellular Proteomics, Vol. 9, Issue 7
Prodigal: prokaryotic gene recognition and translation initiation site identification
journal, March 2010
- Hyatt, Doug; Chen, Gwo-Liang; LoCascio, Philip F.
- BMC Bioinformatics, Vol. 11, Issue 1
RefSeq microbial genomes database: new representation and annotation strategy
journal, December 2013
- Tatusova, Tatiana; Ciufo, Stacy; Fedorov, Boris
- Nucleic Acids Research, Vol. 42, Issue D1
Proteogenomics to discover the full coding content of genomes: A computational perspective
journal, October 2010
- Castellana, Natalie; Bafna, Vineet
- Journal of Proteomics, Vol. 73, Issue 11
Improving gene annotation using peptide mass spectrometry
journal, January 2007
- Tanner, S.; Shen, Z.; Ng, J.
- Genome Research, Vol. 17, Issue 2
Identifying bacterial genes and endosymbiont DNA with Glimmer
journal, January 2007
- Delcher, Arthur L.; Bratke, Kirsten A.; Powers, Edwin C.
- Bioinformatics, Vol. 23, Issue 6
GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes
journal, May 2010
- Pati, Amrita; Ivanova, Natalia N.; Mikhailova, Natalia
- Nature Methods, Vol. 7, Issue 6
Heterocyst Pattern Formation Controlled by a Diffusible Peptide
journal, October 1998
- Yoon, H.
- Science, Vol. 282, Issue 5390
The Proteomics Identifications (PRIDE) database and associated tools: status in 2013
journal, November 2012
- Vizcaíno, Juan Antonio; Côté, Richard G.; Csordas, Attila
- Nucleic Acids Research, Vol. 41, Issue D1
Works referencing / citing this record:
AnnoTree: visualization and exploration of a functionally annotated microbial tree of life
journal, April 2019
- Mendler, Kerrin; Chen, Han; Parks, Donovan H.
- Nucleic Acids Research, Vol. 47, Issue 9
Occurrence and expression of genes encoding methyl-compound production in rumen bacteria
journal, November 2019
- Kelly, William J.; Leahy, Sinead C.; Kamke, Janine
- Animal Microbiome, Vol. 1, Issue 1
Methods, Tools and Current Perspectives in Proteogenomics
journal, April 2017
- Ruggles, Kelly V.; Krug, Karsten; Wang, Xiaojing
- Molecular & Cellular Proteomics, Vol. 16, Issue 6
Comparative Genomics of Rumen Butyrivibrio spp. Uncovers a Continuum of Polysaccharide-Degrading Capabilities
journal, October 2019
- Palevich, Nikola; Kelly, William J.; Leahy, Sinead C.
- Applied and Environmental Microbiology, Vol. 86, Issue 1
1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life
journal, June 2017
- Mukherjee, Supratim; Seshadri, Rekha; Varghese, Neha J.
- Nature Biotechnology, Vol. 35, Issue 7
Cultivation and sequencing of rumen microbiome members from the Hungate1000 Collection
journal, March 2018
- Seshadri, Rekha; Leahy, Sinead C.; Attwood, Graeme T.
- Nature Biotechnology, Vol. 36, Issue 4
AnnoTree: visualization and exploration of a functionally annotated microbial tree of life
journal, April 2019
- Mendler, Kerrin; Chen, Han; Parks, Donovan H.
- Nucleic Acids Research, Vol. 47, Issue 9
Comparative Genomics of Rumen Butyrivibrio spp. Uncovers a Continuum of Polysaccharide-Degrading Capabilities
journal, October 2019
- Palevich, Nikola; Kelly, William J.; Leahy, Sinead C.
- Applied and Environmental Microbiology, Vol. 86, Issue 1
Persistence of Functional Protein Domains in Mycoplasma Species and their Role in Host Specificity and Synthetic Minimal Life
journal, February 2017
- Kamminga, Tjerko; Koehorst, Jasper J.; Vermeij, Paul
- Frontiers in Cellular and Infection Microbiology, Vol. 7