DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Next generation models for storage and representation of microbial biological annotation

Abstract

Background: Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation. Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software system to hardware and third party software (e.g. relational database systems and schemas). This makes annotation systems hard to reproduce, inflexible to modification over time, difficult to assess, difficult to partition across multiple geographic sites, and difficult to understand for those who are not domain experts. These systems are not readily open to scrutiny and therefore not scientifically tractable. The advent of Semantic Web standards such as Resource Description Framework (RDF) and OWL Web Ontology Language (OWL) enables us to construct systems that address these challenges in a new comprehensive way. Results: Here, we develop a framework for linking traditional datamore » to OWL-based ontologies in genome annotation. We show how data standards can decouple hardware and third party software tools from annotation pipelines, thereby making annotation pipelines easier to reproduce and assess. An illustrative example shows how TURTLE (Terse RDF Triple Language) can be used as a human readable, but also semantically-aware, equivalent to GenBank/EMBL files. Conclusions: The power of this approach lies in its ability to assemble annotation data from multiple databases across multiple locations into a representation that is understandable to researchers. In this way, all researchers, experimental and computational, will more easily understand the informatics processes constructing genome annotation and ultimately be able to help improve the systems that produce them.« less

Authors:
 [1];  [1];  [1];  [1]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Biosciences Division
Publication Date:
Research Org.:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
OSTI Identifier:
1626276
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
BMC Bioinformatics
Additional Journal Information:
Journal Volume: 11; Journal Issue: S6; Journal ID: ISSN 1471-2105
Publisher:
BioMed Central
Country of Publication:
United States
Language:
English
Subject:
Biochemistry & Molecular Biology; Biotechnology & Applied Microbiology; Mathematical & Computational Biology

Citation Formats

Quest, Daniel J., Land, Miriam L., Brettin, Thomas S., and Cottingham, Robert W. Next generation models for storage and representation of microbial biological annotation. United States: N. p., 2010. Web. doi:10.1186/1471-2105-11-s6-s15.
Quest, Daniel J., Land, Miriam L., Brettin, Thomas S., & Cottingham, Robert W. Next generation models for storage and representation of microbial biological annotation. United States. https://doi.org/10.1186/1471-2105-11-s6-s15
Quest, Daniel J., Land, Miriam L., Brettin, Thomas S., and Cottingham, Robert W. Fri . "Next generation models for storage and representation of microbial biological annotation". United States. https://doi.org/10.1186/1471-2105-11-s6-s15. https://www.osti.gov/servlets/purl/1626276.
@article{osti_1626276,
title = {Next generation models for storage and representation of microbial biological annotation},
author = {Quest, Daniel J. and Land, Miriam L. and Brettin, Thomas S. and Cottingham, Robert W.},
abstractNote = {Background: Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation. Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software system to hardware and third party software (e.g. relational database systems and schemas). This makes annotation systems hard to reproduce, inflexible to modification over time, difficult to assess, difficult to partition across multiple geographic sites, and difficult to understand for those who are not domain experts. These systems are not readily open to scrutiny and therefore not scientifically tractable. The advent of Semantic Web standards such as Resource Description Framework (RDF) and OWL Web Ontology Language (OWL) enables us to construct systems that address these challenges in a new comprehensive way. Results: Here, we develop a framework for linking traditional data to OWL-based ontologies in genome annotation. We show how data standards can decouple hardware and third party software tools from annotation pipelines, thereby making annotation pipelines easier to reproduce and assess. An illustrative example shows how TURTLE (Terse RDF Triple Language) can be used as a human readable, but also semantically-aware, equivalent to GenBank/EMBL files. Conclusions: The power of this approach lies in its ability to assemble annotation data from multiple databases across multiple locations into a representation that is understandable to researchers. In this way, all researchers, experimental and computational, will more easily understand the informatics processes constructing genome annotation and ultimately be able to help improve the systems that produce them.},
doi = {10.1186/1471-2105-11-s6-s15},
journal = {BMC Bioinformatics},
number = S6,
volume = 11,
place = {United States},
year = {Fri Oct 01 00:00:00 EDT 2010},
month = {Fri Oct 01 00:00:00 EDT 2010}
}

Works referenced in this record:

Genome re-annotation: a wiki solution?
journal, January 2007


The Distributed Annotation System
journal, January 2001

  • Dowell, Robin D.; Jokerst, Rodney M.; Day, Allen
  • BMC Bioinformatics, Vol. 2, Issue 1, p. 7
  • DOI: 10.1186/1471-2105-2-7

The Distributed Annotation System for Integration of Biological Data
book, January 2006

  • Prlić, Andreas; Birney, Ewan; Cox, Tony
  • Lecture Notes in Computer Science
  • DOI: 10.1007/11799511_17

Gene Ontology: tool for the unification of biology
journal, May 2000

  • Ashburner, Michael; Ball, Catherine A.; Blake, Judith A.
  • Nature Genetics, Vol. 25, Issue 1
  • DOI: 10.1038/75556

From SHIQ and RDF to OWL: the making of a Web Ontology Language
journal, December 2003


OWL 2: The next step for OWL
journal, November 2008


SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services
journal, September 2009


A Chado case study: an ontology-based modular schema for representing genome-associated biological information
journal, July 2007


The Generic Genome Browser: A Building Block for a Model Organism System Database
journal, October 2002


The EMBL Nucleotide Sequence Database
journal, January 2002


RACER System Description
book, January 2001


KEGG: Kyoto Encyclopedia of Genes and Genomes
journal, January 2000

  • Kanehisa, Minoru; Goto, Susumu
  • Nucleic Acids Research, Vol. 28, Issue 1, p. 27-30
  • DOI: 10.1093/nar/28.1.27

Prodigal: prokaryotic gene recognition and translation initiation site identification
journal, March 2010


Creating Semantic Web contents with Protege-2000
journal, March 2001

  • Noy, N. F.; Sintek, M.; Decker, S.
  • IEEE Intelligent Systems, Vol. 16, Issue 2
  • DOI: 10.1109/5254.920601

The Bioperl Toolkit: Perl Modules for the Life Sciences
journal, October 2002


OWL 2: The next step for OWL
journal, November 2008


Dietary palmitic acid promotes a prometastatic memory via Schwann cells
journal, November 2021


Red versus green leaves: transcriptomic comparison of foliar senescence between two Prunus cerasifera genotypes
journal, February 2020


The Semantic Web
journal, May 2001


A Chado case study: an ontology-based modular schema for representing genome-associated biological information
journal, July 2007


The Bioperl Toolkit: Perl Modules for the Life Sciences
journal, October 2002


An Evidence Ontology for use in Pathway/Genome Databases
conference, December 2003


SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services
journal, September 2009


Prodigal: prokaryotic gene recognition and translation initiation site identification
journal, March 2010


Advancing translational research with the Semantic Web
journal, January 2007


Model storage, exchange and integration
journal, October 2006


Genome re-annotation: a wiki solution?
journal, January 2007


GMODWeb: a web framework for the generic model organism database
journal, January 2008


Initial Implementation of a Comparative Data Analysis Ontology
journal, January 2009

  • Prosdocimi, Francisco; Chisham, Brandon; Pontelli, Enrico
  • Evolutionary Bioinformatics, Vol. 5
  • DOI: 10.4137/ebo.s2320

Works referencing / citing this record:

WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata
journal, January 2017

  • Putman, Tim E.; Lelong, Sebastien; Burgstaller-Muehlbacher, Sebastian
  • Database, Vol. 2017
  • DOI: 10.1093/database/bax025

WikiGenomes: an open Web application for community consumption and curation of gene annotation data in Wikidata.
audiovisual, January 2017


WikiGenomes: an open Web application for community consumption and curation of gene annotation data in Wikidata.
audiovisual, January 2017


Proceedings of the 2010 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference
journal, October 2010