Next generation models for storage and representation of microbial biological annotation
Abstract
Background: Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation. Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software system to hardware and third party software (e.g. relational database systems and schemas). This makes annotation systems hard to reproduce, inflexible to modification over time, difficult to assess, difficult to partition across multiple geographic sites, and difficult to understand for those who are not domain experts. These systems are not readily open to scrutiny and therefore not scientifically tractable. The advent of Semantic Web standards such as Resource Description Framework (RDF) and OWL Web Ontology Language (OWL) enables us to construct systems that address these challenges in a new comprehensive way. Results: Here, we develop a framework for linking traditional datamore »
- Authors:
-
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Biosciences Division
- Publication Date:
- Research Org.:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
- OSTI Identifier:
- 1626276
- Grant/Contract Number:
- AC05-00OR22725
- Resource Type:
- Accepted Manuscript
- Journal Name:
- BMC Bioinformatics
- Additional Journal Information:
- Journal Volume: 11; Journal Issue: S6; Journal ID: ISSN 1471-2105
- Publisher:
- BioMed Central
- Country of Publication:
- United States
- Language:
- English
- Subject:
- Biochemistry & Molecular Biology; Biotechnology & Applied Microbiology; Mathematical & Computational Biology
Citation Formats
Quest, Daniel J., Land, Miriam L., Brettin, Thomas S., and Cottingham, Robert W. Next generation models for storage and representation of microbial biological annotation. United States: N. p., 2010.
Web. doi:10.1186/1471-2105-11-s6-s15.
Quest, Daniel J., Land, Miriam L., Brettin, Thomas S., & Cottingham, Robert W. Next generation models for storage and representation of microbial biological annotation. United States. https://doi.org/10.1186/1471-2105-11-s6-s15
Quest, Daniel J., Land, Miriam L., Brettin, Thomas S., and Cottingham, Robert W. Fri .
"Next generation models for storage and representation of microbial biological annotation". United States. https://doi.org/10.1186/1471-2105-11-s6-s15. https://www.osti.gov/servlets/purl/1626276.
@article{osti_1626276,
title = {Next generation models for storage and representation of microbial biological annotation},
author = {Quest, Daniel J. and Land, Miriam L. and Brettin, Thomas S. and Cottingham, Robert W.},
abstractNote = {Background: Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation. Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software system to hardware and third party software (e.g. relational database systems and schemas). This makes annotation systems hard to reproduce, inflexible to modification over time, difficult to assess, difficult to partition across multiple geographic sites, and difficult to understand for those who are not domain experts. These systems are not readily open to scrutiny and therefore not scientifically tractable. The advent of Semantic Web standards such as Resource Description Framework (RDF) and OWL Web Ontology Language (OWL) enables us to construct systems that address these challenges in a new comprehensive way. Results: Here, we develop a framework for linking traditional data to OWL-based ontologies in genome annotation. We show how data standards can decouple hardware and third party software tools from annotation pipelines, thereby making annotation pipelines easier to reproduce and assess. An illustrative example shows how TURTLE (Terse RDF Triple Language) can be used as a human readable, but also semantically-aware, equivalent to GenBank/EMBL files. Conclusions: The power of this approach lies in its ability to assemble annotation data from multiple databases across multiple locations into a representation that is understandable to researchers. In this way, all researchers, experimental and computational, will more easily understand the informatics processes constructing genome annotation and ultimately be able to help improve the systems that produce them.},
doi = {10.1186/1471-2105-11-s6-s15},
journal = {BMC Bioinformatics},
number = S6,
volume = 11,
place = {United States},
year = {Fri Oct 01 00:00:00 EDT 2010},
month = {Fri Oct 01 00:00:00 EDT 2010}
}
Works referenced in this record:
Genome re-annotation: a wiki solution?
journal, January 2007
- Salzberg, Steven L.
- Genome Biology, Vol. 8, Issue 1
The Distributed Annotation System
journal, January 2001
- Dowell, Robin D.; Jokerst, Rodney M.; Day, Allen
- BMC Bioinformatics, Vol. 2, Issue 1, p. 7
The Distributed Annotation System for Integration of Biological Data
book, January 2006
- Prlić, Andreas; Birney, Ewan; Cox, Tony
- Lecture Notes in Computer Science
Gene Ontology: tool for the unification of biology
journal, May 2000
- Ashburner, Michael; Ball, Catherine A.; Blake, Judith A.
- Nature Genetics, Vol. 25, Issue 1
From SHIQ and RDF to OWL: the making of a Web Ontology Language
journal, December 2003
- Horrocks, Ian; Patel-Schneider, Peter F.; van Harmelen, Frank
- Journal of Web Semantics, Vol. 1, Issue 1
OWL 2: The next step for OWL
journal, November 2008
- Grau, Bernardo Cuenca; Horrocks, Ian; Motik, Boris
- Journal of Web Semantics, Vol. 6, Issue 4
SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services
journal, September 2009
- Gessler, Damian DG; Schiltz, Gary S.; May, Greg D.
- BMC Bioinformatics, Vol. 10, Issue 1
A Chado case study: an ontology-based modular schema for representing genome-associated biological information
journal, July 2007
- Mungall, Christopher J.; Emmert, David B.
- Bioinformatics, Vol. 23, Issue 13
The Generic Genome Browser: A Building Block for a Model Organism System Database
journal, October 2002
- Stein, L. D.
- Genome Research, Vol. 12, Issue 10
The EMBL Nucleotide Sequence Database
journal, January 2002
- Stoesser, G.
- Nucleic Acids Research, Vol. 30, Issue 1
KEGG: Kyoto Encyclopedia of Genes and Genomes
journal, January 2000
- Kanehisa, Minoru; Goto, Susumu
- Nucleic Acids Research, Vol. 28, Issue 1, p. 27-30
Prodigal: prokaryotic gene recognition and translation initiation site identification
journal, March 2010
- Hyatt, Doug; Chen, Gwo-Liang; LoCascio, Philip F.
- BMC Bioinformatics, Vol. 11, Issue 1
Creating Semantic Web contents with Protege-2000
journal, March 2001
- Noy, N. F.; Sintek, M.; Decker, S.
- IEEE Intelligent Systems, Vol. 16, Issue 2
The Bioperl Toolkit: Perl Modules for the Life Sciences
journal, October 2002
- Stajich, J. E.
- Genome Research, Vol. 12, Issue 10
OWL 2: The next step for OWL
journal, November 2008
- Grau, Bernardo Cuenca; Horrocks, Ian; Motik, Boris
- Journal of Web Semantics, Vol. 6, Issue 4
Dietary palmitic acid promotes a prometastatic memory via Schwann cells
journal, November 2021
- Pascual, Gloria; Domínguez, Diana; Elosúa-Bayes, Marc
- Nature, Vol. 599, Issue 7885
Red versus green leaves: transcriptomic comparison of foliar senescence between two Prunus cerasifera genotypes
journal, February 2020
- Vangelisti, Alberto; Guidi, Lucia; Cavallini, Andrea
- Scientific Reports, Vol. 10, Issue 1
The Semantic Web
journal, May 2001
- Berners-Lee, Tim; Hendler, James; Lassila, Ora
- Scientific American, Vol. 284, Issue 5
A Chado case study: an ontology-based modular schema for representing genome-associated biological information
journal, July 2007
- Mungall, Christopher J.; Emmert, David B.
- Bioinformatics, Vol. 23, Issue 13
The Bioperl Toolkit: Perl Modules for the Life Sciences
journal, October 2002
- Stajich, J. E.
- Genome Research, Vol. 12, Issue 10
An Evidence Ontology for use in Pathway/Genome Databases
conference, December 2003
- Karp, P. D.; Paley, S.; Krieger, C. J.
- Biocomputing 2004
SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services
journal, September 2009
- Gessler, Damian DG; Schiltz, Gary S.; May, Greg D.
- BMC Bioinformatics, Vol. 10, Issue 1
Prodigal: prokaryotic gene recognition and translation initiation site identification
journal, March 2010
- Hyatt, Doug; Chen, Gwo-Liang; LoCascio, Philip F.
- BMC Bioinformatics, Vol. 11, Issue 1
Advancing translational research with the Semantic Web
journal, January 2007
- Ruttenberg, Alan; Clark, Tim; Bug, William
- BMC Bioinformatics, Vol. 8, Issue Suppl 3
Model storage, exchange and integration
journal, October 2006
- Le Novère, Nicolas
- BMC Neuroscience, Vol. 7, Issue S1
Genome re-annotation: a wiki solution?
journal, January 2007
- Salzberg, Steven L.
- Genome Biology, Vol. 8, Issue 1
GMODWeb: a web framework for the generic model organism database
journal, January 2008
- O'Connor, Brian D.; Day, Allen; Cain, Scott
- Genome Biology, Vol. 9, Issue 6
Initial Implementation of a Comparative Data Analysis Ontology
journal, January 2009
- Prosdocimi, Francisco; Chisham, Brandon; Pontelli, Enrico
- Evolutionary Bioinformatics, Vol. 5
Works referencing / citing this record:
WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata
journal, January 2017
- Putman, Tim E.; Lelong, Sebastien; Burgstaller-Muehlbacher, Sebastian
- Database, Vol. 2017
WikiGenomes: an open Web application for community consumption and curation of gene annotation data in Wikidata.
audiovisual, January 2017
- Putman, Timothy; Lelong, Sebastien; Burgstaller-Muehlbacher, Sebastian
- figshare
WikiGenomes: an open Web application for community consumption and curation of gene annotation data in Wikidata.
audiovisual, January 2017
- Putman, Timothy; Lelong, Sebastien; Burgstaller-Muehlbacher, Sebastian
- figshare
Proceedings of the 2010 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference
journal, October 2010
- Wren, Jonathan D.; Kupfer, Doris M.; Perkins, Edward J.
- BMC Bioinformatics, Vol. 11, Issue S6