Next generation models for storage and representation of microbial biological annotation

Quest, Daniel J.; Land, Miriam L.; Brettin, Thomas S.; Cottingham, Robert W.

doi:10.1186/1471-2105-11-s6-s15

Title: Next generation models for storage and representation of microbial biological annotation

Abstract

Background: Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation. Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software system to hardware and third party software (e.g. relational database systems and schemas). This makes annotation systems hard to reproduce, inflexible to modification over time, difficult to assess, difficult to partition across multiple geographic sites, and difficult to understand for those who are not domain experts. These systems are not readily open to scrutiny and therefore not scientifically tractable. The advent of Semantic Web standards such as Resource Description Framework (RDF) and OWL Web Ontology Language (OWL) enables us to construct systems that address these challenges in a new comprehensive way. Results: Here, we develop a framework for linking traditional datamore »« less

Authors:

Quest, Daniel J. ^[1]; Land, Miriam L. ^[1]; Brettin, Thomas S. ^[1]; Cottingham, Robert W. ^[1]

Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Biosciences Division

Publication Date:: Fri Oct 01 00:00:00 EDT 2010

Research Org.:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Org.:: USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division

OSTI Identifier:: 1626276

Grant/Contract Number:: AC05-00OR22725

Resource Type:: Accepted Manuscript

Journal Name:: BMC Bioinformatics

Additional Journal Information:: Journal Volume: 11; Journal Issue: S6; Journal ID: ISSN 1471-2105

Publisher:: BioMed Central

Country of Publication:: United States

Language:: English

Subject:: Biochemistry & Molecular Biology; Biotechnology & Applied Microbiology; Mathematical & Computational Biology

Citation Formats


                    Quest, Daniel J., Land, Miriam L., Brettin, Thomas S., and Cottingham, Robert W. Next generation models for storage and representation of microbial biological annotation.  United States: N. p., 2010. 
Web.  doi:10.1186/1471-2105-11-s6-s15.

Copy to clipboard


                    Quest, Daniel J., Land, Miriam L., Brettin, Thomas S., & Cottingham, Robert W. Next generation models for storage and representation of microbial biological annotation.  United States.  https://doi.org/10.1186/1471-2105-11-s6-s15

Copy to clipboard


                    Quest, Daniel J., Land, Miriam L., Brettin, Thomas S., and Cottingham, Robert W. Fri .  
"Next generation models for storage and representation of microbial biological annotation".  United States.  https://doi.org/10.1186/1471-2105-11-s6-s15.  https://www.osti.gov/servlets/purl/1626276.

Copy to clipboard


                    
@article{osti_1626276,

  title        = {Next generation models for storage and representation of microbial biological annotation},

  author       = {Quest, Daniel J. and Land, Miriam L. and Brettin, Thomas S. and Cottingham, Robert W.},

  abstractNote = {Background: Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation. Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software system to hardware and third party software (e.g. relational database systems and schemas). This makes annotation systems hard to reproduce, inflexible to modification over time, difficult to assess, difficult to partition across multiple geographic sites, and difficult to understand for those who are not domain experts. These systems are not readily open to scrutiny and therefore not scientifically tractable. The advent of Semantic Web standards such as Resource Description Framework (RDF) and OWL Web Ontology Language (OWL) enables us to construct systems that address these challenges in a new comprehensive way. Results: Here, we develop a framework for linking traditional data to OWL-based ontologies in genome annotation. We show how data standards can decouple hardware and third party software tools from annotation pipelines, thereby making annotation pipelines easier to reproduce and assess. An illustrative example shows how TURTLE (Terse RDF Triple Language) can be used as a human readable, but also semantically-aware, equivalent to GenBank/EMBL files. Conclusions: The power of this approach lies in its ability to assemble annotation data from multiple databases across multiple locations into a representation that is understandable to researchers. In this way, all researchers, experimental and computational, will more easily understand the informatics processes constructing genome annotation and ultimately be able to help improve the systems that produce them.},

  doi          = {10.1186/1471-2105-11-s6-s15},

  journal      = {BMC Bioinformatics},

  number       = S6,

  volume       = 11,

  place        = {United States},

  year         = {Fri Oct 01 00:00:00 EDT 2010},

  month        = {Fri Oct 01 00:00:00 EDT 2010}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Accepted Manuscript (DOE)

Publisher's Version of Record

https://doi.org/10.1186/1471-2105-11-s6-s15

Other availability

Search WorldCat to find libraries that may hold this journal

Save / Share:

Export Metadata

Save to My Library

Works referenced in this record:

Genome re-annotation: a wiki solution?
journal, January 2007

Salzberg, Steven L.
Genome Biology, Vol. 8, Issue 1
DOI: 10.1186/gb-2007-8-1-102

The Distributed Annotation System
journal, January 2001

Dowell, Robin D.; Jokerst, Rodney M.; Day, Allen
BMC Bioinformatics, Vol. 2, Issue 1, p. 7
DOI: 10.1186/1471-2105-2-7

The Distributed Annotation System for Integration of Biological Data
book, January 2006

Prlić, Andreas; Birney, Ewan; Cox, Tony
Lecture Notes in Computer Science
DOI: 10.1007/11799511_17

Gene Ontology: tool for the unification of biology
journal, May 2000

Ashburner, Michael; Ball, Catherine A.; Blake, Judith A.
Nature Genetics, Vol. 25, Issue 1
DOI: 10.1038/75556

From SHIQ and RDF to OWL: the making of a Web Ontology Language
journal, December 2003

Horrocks, Ian; Patel-Schneider, Peter F.; van Harmelen, Frank
Journal of Web Semantics, Vol. 1, Issue 1
DOI: 10.1016/j.websem.2003.07.001

OWL 2: The next step for OWL
journal, November 2008

Grau, Bernardo Cuenca; Horrocks, Ian; Motik, Boris
Journal of Web Semantics, Vol. 6, Issue 4
DOI: 10.1016/j.websem.2008.05.001

SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services
journal, September 2009

Gessler, Damian DG; Schiltz, Gary S.; May, Greg D.
BMC Bioinformatics, Vol. 10, Issue 1
DOI: 10.1186/1471-2105-10-309

A Chado case study: an ontology-based modular schema for representing genome-associated biological information
journal, July 2007

Mungall, Christopher J.; Emmert, David B.
Bioinformatics, Vol. 23, Issue 13
DOI: 10.1093/bioinformatics/btm189

The Generic Genome Browser: A Building Block for a Model Organism System Database
journal, October 2002

Stein, L. D.
Genome Research, Vol. 12, Issue 10
DOI: 10.1101/gr.403602

The EMBL Nucleotide Sequence Database
journal, January 2002

Stoesser, G.
Nucleic Acids Research, Vol. 30, Issue 1
DOI: 10.1093/nar/30.1.21

RACER System Description
book, January 2001

Haarslev, Volker; Möller, Ralf
Automated Reasoning
DOI: 10.1007/3-540-45744-5_59

KEGG: Kyoto Encyclopedia of Genes and Genomes
journal, January 2000

Kanehisa, Minoru; Goto, Susumu
Nucleic Acids Research, Vol. 28, Issue 1, p. 27-30
DOI: 10.1093/nar/28.1.27

Prodigal: prokaryotic gene recognition and translation initiation site identification
journal, March 2010

Hyatt, Doug; Chen, Gwo-Liang; LoCascio, Philip F.
BMC Bioinformatics, Vol. 11, Issue 1
DOI: 10.1186/1471-2105-11-119

Creating Semantic Web contents with Protege-2000
journal, March 2001

Noy, N. F.; Sintek, M.; Decker, S.
IEEE Intelligent Systems, Vol. 16, Issue 2
DOI: 10.1109/5254.920601

The Bioperl Toolkit: Perl Modules for the Life Sciences
journal, October 2002

Stajich, J. E.
Genome Research, Vol. 12, Issue 10
DOI: 10.1101/gr.361602

OWL 2: The next step for OWL
journal, November 2008

Grau, Bernardo Cuenca; Horrocks, Ian; Motik, Boris
Journal of Web Semantics, Vol. 6, Issue 4
DOI: 10.1016/j.websem.2008.05.001

Dietary palmitic acid promotes a prometastatic memory via Schwann cells
journal, November 2021

Pascual, Gloria; Domínguez, Diana; Elosúa-Bayes, Marc
Nature, Vol. 599, Issue 7885
DOI: 10.1038/s41586-021-04075-0

Red versus green leaves: transcriptomic comparison of foliar senescence between two Prunus cerasifera genotypes
journal, February 2020

Vangelisti, Alberto; Guidi, Lucia; Cavallini, Andrea
Scientific Reports, Vol. 10, Issue 1
DOI: 10.1038/s41598-020-58878-8

The Semantic Web
journal, May 2001

Berners-Lee, Tim; Hendler, James; Lassila, Ora
Scientific American, Vol. 284, Issue 5
DOI: 10.1038/scientificamerican0501-34

A Chado case study: an ontology-based modular schema for representing genome-associated biological information
journal, July 2007

Mungall, Christopher J.; Emmert, David B.
Bioinformatics, Vol. 23, Issue 13
DOI: 10.1093/bioinformatics/btm189

The Bioperl Toolkit: Perl Modules for the Life Sciences
journal, October 2002

Stajich, J. E.
Genome Research, Vol. 12, Issue 10
DOI: 10.1101/gr.361602

An Evidence Ontology for use in Pathway/Genome Databases
conference, December 2003

Karp, P. D.; Paley, S.; Krieger, C. J.
Biocomputing 2004
DOI: 10.1142/9789812704856_0019

SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services
journal, September 2009

Gessler, Damian DG; Schiltz, Gary S.; May, Greg D.
BMC Bioinformatics, Vol. 10, Issue 1
DOI: 10.1186/1471-2105-10-309

Prodigal: prokaryotic gene recognition and translation initiation site identification
journal, March 2010

Hyatt, Doug; Chen, Gwo-Liang; LoCascio, Philip F.
BMC Bioinformatics, Vol. 11, Issue 1
DOI: 10.1186/1471-2105-11-119

Advancing translational research with the Semantic Web
journal, January 2007

Ruttenberg, Alan; Clark, Tim; Bug, William
BMC Bioinformatics, Vol. 8, Issue Suppl 3
DOI: 10.1186/1471-2105-8-s3-s2

Model storage, exchange and integration
journal, October 2006

Le Novère, Nicolas
BMC Neuroscience, Vol. 7, Issue S1
DOI: 10.1186/1471-2202-7-s1-s11

Genome re-annotation: a wiki solution?
journal, January 2007

Salzberg, Steven L.
Genome Biology, Vol. 8, Issue 1
DOI: 10.1186/gb-2007-8-1-102

GMODWeb: a web framework for the generic model organism database
journal, January 2008

O'Connor, Brian D.; Day, Allen; Cain, Scott
Genome Biology, Vol. 9, Issue 6
DOI: 10.1186/gb-2008-9-6-r102

Initial Implementation of a Comparative Data Analysis Ontology
journal, January 2009

Prosdocimi, Francisco; Chisham, Brandon; Pontelli, Enrico
Evolutionary Bioinformatics, Vol. 5
DOI: 10.4137/ebo.s2320

Works referencing / citing this record:

WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata
journal, January 2017

Putman, Tim E.; Lelong, Sebastien; Burgstaller-Muehlbacher, Sebastian
Database, Vol. 2017
DOI: 10.1093/database/bax025

WikiGenomes: an open Web application for community consumption and curation of gene annotation data in Wikidata.
audiovisual, January 2017

Putman, Timothy; Lelong, Sebastien; Burgstaller-Muehlbacher, Sebastian
figshare
DOI: 10.6084/m9.figshare.5150065

WikiGenomes: an open Web application for community consumption and curation of gene annotation data in Wikidata.
audiovisual, January 2017

Putman, Timothy; Lelong, Sebastien; Burgstaller-Muehlbacher, Sebastian
figshare
DOI: 10.6084/m9.figshare.5150065.v1

Proceedings of the 2010 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference
journal, October 2010

Wren, Jonathan D.; Kupfer, Doris M.; Perkins, Edward J.
BMC Bioinformatics, Vol. 11, Issue S6
DOI: 10.1186/1471-2105-11-s6-s1

Similar Records in DOE PAGES and OSTI.GOV collections:

Next Generation Models for Storage and Representation of Microbial Biological Annotation

Conference Quest, Daniel J ; Land, Miriam L ; Brettin, Thomas S ; ...

Background Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation. Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software systemmore »« less
Publication and Retrieval of Computational Chemical-Physical Data Via the Semantic Web. Final Technical Report

Technical Report Ostlund, Neil

This research showed the feasibility of applying the concepts of the Semantic Web to Computation Chemistry. We have created the first web portal (www.chemsem.com) that allows data created in the calculations of quantum chemistry, and other such chemistry calculations to be placed on the web in a way that makes the data accessible to scientists in a semantic form never before possible. The semantic web nature of the portal allows data to be searched, found, and used as an advance over the usual approach of a relational database. The semantic data on our portal has the nature of a Giantmore »« less
https://doi.org/10.2172/1371962

Full Text Available
Towards Cache-Enabled, Order-Aware, Ontology-Based Stream Reasoning Framework

Conference Yan, Rui ; Praggastis, Brenda L. ; Smith, William P. ; ...

While streaming data have become increasingly more popular in business and research communities, semantic models and processing software for streaming data have not kept pace. Traditional semantic solutions have not addressed transient data streams. Semantic web languages (e.g., RDF, OWL) have typically addressed static data settings and linked data approaches have predominantly addressed static or growing data repositories. Streaming data settings have some fundamental differences; in particular, data are consumed on the fly and data may expire. Stream reasoning, a combination of stream processing and semantic reasoning, has emerged with the vision of providing "smart" processing of streaming data. C-SPARQLmore »« less
The NamesforLife Semantic Index of Phenotypic and Genotypic Data for Systems Biology

Technical Report Garrity, George M. ; Parker, Charles T.

Purpose of Research The research performed by NamesforLife, LLC and Michigan State University during the development of the Semantic Index of Genotypic and Phenotypic Data for Systems Biology addresses several key aspects of the “scientific reproducibility crisis” facing the field of microbiology and the scholarly publishing industry. Research Carried Out in this Project During the course of this project, a new method of Knowledge Organization was investigated for ontology and thesaurus construction, machine learning software was developed for Information Extraction (IE), and an extensive curatorial effort was undertaken to produce a lexicon of phenotypic terms that is backed by bothmore »« less
Retaining Systems Engineering Model Meaning Through Transformation: Demo 2

Technical Report Carroll, Edward Ralph ; Jarosz, Jason P. ; Tafoya, Carlos Jerome ; ...

Digital engineering strategies typically assume that digital engineering models interoperate seamlessly across the multiple different engineering modeling software applications involved, such as model- based systems engineering (MBSE), mechanical computer-aided design (MCAD), electrical computer-aided design (ECAD), and other engineering modeling applications. The presumption is that the data schema in these modeling software applications are structured in the familiar flat- tabular schema like any other software application. Engineering domain-specific applications (e.g., systems, mechanical, electrical, simulation) are typically designed to solve domain-specific problems, necessarily excluding explicit representations of non-domain information to help the engineer focus on the domain problems (system definition, design, simulation).more »« less
https://doi.org/10.2172/1770261

Full Text Available

Similar Records

Title: Next generation models for storage and representation of microbial biological annotation

Abstract

Citation Formats

Genome re-annotation: a wiki solution? journal, January 2007

The Distributed Annotation System journal, January 2001

The Distributed Annotation System for Integration of Biological Data book, January 2006

Gene Ontology: tool for the unification of biology journal, May 2000

From SHIQ and RDF to OWL: the making of a Web Ontology Language journal, December 2003

OWL 2: The next step for OWL journal, November 2008

SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services journal, September 2009

A Chado case study: an ontology-based modular schema for representing genome-associated biological information journal, July 2007

The Generic Genome Browser: A Building Block for a Model Organism System Database journal, October 2002

The EMBL Nucleotide Sequence Database journal, January 2002

RACER System Description book, January 2001

KEGG: Kyoto Encyclopedia of Genes and Genomes journal, January 2000

Prodigal: prokaryotic gene recognition and translation initiation site identification journal, March 2010

Creating Semantic Web contents with Protege-2000 journal, March 2001

The Bioperl Toolkit: Perl Modules for the Life Sciences journal, October 2002

OWL 2: The next step for OWL journal, November 2008

Dietary palmitic acid promotes a prometastatic memory via Schwann cells journal, November 2021

Red versus green leaves: transcriptomic comparison of foliar senescence between two Prunus cerasifera genotypes journal, February 2020

The Semantic Web journal, May 2001

A Chado case study: an ontology-based modular schema for representing genome-associated biological information journal, July 2007

The Bioperl Toolkit: Perl Modules for the Life Sciences journal, October 2002

An Evidence Ontology for use in Pathway/Genome Databases conference, December 2003

SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services journal, September 2009

Prodigal: prokaryotic gene recognition and translation initiation site identification journal, March 2010

Advancing translational research with the Semantic Web journal, January 2007

Model storage, exchange and integration journal, October 2006

Genome re-annotation: a wiki solution? journal, January 2007

GMODWeb: a web framework for the generic model organism database journal, January 2008

Initial Implementation of a Comparative Data Analysis Ontology journal, January 2009

WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata journal, January 2017

WikiGenomes: an open Web application for community consumption and curation of gene annotation data in Wikidata. audiovisual, January 2017

WikiGenomes: an open Web application for community consumption and curation of gene annotation data in Wikidata. audiovisual, January 2017

Proceedings of the 2010 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference journal, October 2010

Genome re-annotation: a wiki solution?
journal, January 2007

The Distributed Annotation System
journal, January 2001

The Distributed Annotation System for Integration of Biological Data
book, January 2006

Gene Ontology: tool for the unification of biology
journal, May 2000

From SHIQ and RDF to OWL: the making of a Web Ontology Language
journal, December 2003

OWL 2: The next step for OWL
journal, November 2008

SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services
journal, September 2009

A Chado case study: an ontology-based modular schema for representing genome-associated biological information
journal, July 2007

The Generic Genome Browser: A Building Block for a Model Organism System Database
journal, October 2002

The EMBL Nucleotide Sequence Database
journal, January 2002

RACER System Description
book, January 2001

KEGG: Kyoto Encyclopedia of Genes and Genomes
journal, January 2000

Prodigal: prokaryotic gene recognition and translation initiation site identification
journal, March 2010

Creating Semantic Web contents with Protege-2000
journal, March 2001

The Bioperl Toolkit: Perl Modules for the Life Sciences
journal, October 2002

OWL 2: The next step for OWL
journal, November 2008

Dietary palmitic acid promotes a prometastatic memory via Schwann cells
journal, November 2021

Red versus green leaves: transcriptomic comparison of foliar senescence between two Prunus cerasifera genotypes
journal, February 2020

The Semantic Web
journal, May 2001

A Chado case study: an ontology-based modular schema for representing genome-associated biological information
journal, July 2007

The Bioperl Toolkit: Perl Modules for the Life Sciences
journal, October 2002

An Evidence Ontology for use in Pathway/Genome Databases
conference, December 2003

SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services
journal, September 2009

Prodigal: prokaryotic gene recognition and translation initiation site identification
journal, March 2010

Advancing translational research with the Semantic Web
journal, January 2007

Model storage, exchange and integration
journal, October 2006

Genome re-annotation: a wiki solution?
journal, January 2007

GMODWeb: a web framework for the generic model organism database
journal, January 2008

Initial Implementation of a Comparative Data Analysis Ontology
journal, January 2009

WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata
journal, January 2017

WikiGenomes: an open Web application for community consumption and curation of gene annotation data in Wikidata.
audiovisual, January 2017

WikiGenomes: an open Web application for community consumption and curation of gene annotation data in Wikidata.
audiovisual, January 2017

Proceedings of the 2010 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference
journal, October 2010