skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Next generation models for storage and representation of microbial biological annotation

Journal Article · · BMC Bioinformatics
 [1];  [1];  [1];  [1]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Biosciences Division

Background: Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation. Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software system to hardware and third party software (e.g. relational database systems and schemas). This makes annotation systems hard to reproduce, inflexible to modification over time, difficult to assess, difficult to partition across multiple geographic sites, and difficult to understand for those who are not domain experts. These systems are not readily open to scrutiny and therefore not scientifically tractable. The advent of Semantic Web standards such as Resource Description Framework (RDF) and OWL Web Ontology Language (OWL) enables us to construct systems that address these challenges in a new comprehensive way. Results: Here, we develop a framework for linking traditional data to OWL-based ontologies in genome annotation. We show how data standards can decouple hardware and third party software tools from annotation pipelines, thereby making annotation pipelines easier to reproduce and assess. An illustrative example shows how TURTLE (Terse RDF Triple Language) can be used as a human readable, but also semantically-aware, equivalent to GenBank/EMBL files. Conclusions: The power of this approach lies in its ability to assemble annotation data from multiple databases across multiple locations into a representation that is understandable to researchers. In this way, all researchers, experimental and computational, will more easily understand the informatics processes constructing genome annotation and ultimately be able to help improve the systems that produce them.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
1626276
Journal Information:
BMC Bioinformatics, Vol. 11, Issue S6; ISSN 1471-2105
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English

References (23)

Genome re-annotation: a wiki solution? journal January 2007
The Distributed Annotation System journal January 2001
The Distributed Annotation System for Integration of Biological Data book January 2006
Gene Ontology: tool for the unification of biology journal May 2000
From SHIQ and RDF to OWL: the making of a Web Ontology Language journal December 2003
OWL 2: The next step for OWL journal November 2008
SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services journal September 2009
A Chado case study: an ontology-based modular schema for representing genome-associated biological information journal July 2007
The Generic Genome Browser: A Building Block for a Model Organism System Database journal October 2002
The EMBL Nucleotide Sequence Database journal January 2002
RACER System Description book January 2001
KEGG: Kyoto Encyclopedia of Genes and Genomes journal January 2000
Prodigal: prokaryotic gene recognition and translation initiation site identification journal March 2010
Creating Semantic Web contents with Protege-2000 journal March 2001
The Bioperl Toolkit: Perl Modules for the Life Sciences journal October 2002
Dietary palmitic acid promotes a prometastatic memory via Schwann cells journal November 2021
Red versus green leaves: transcriptomic comparison of foliar senescence between two Prunus cerasifera genotypes journal February 2020
The Semantic Web journal May 2001
An Evidence Ontology for use in Pathway/Genome Databases conference December 2003
Advancing translational research with the Semantic Web journal January 2007
Model storage, exchange and integration journal October 2006
GMODWeb: a web framework for the generic model organism database journal January 2008
Initial Implementation of a Comparative Data Analysis Ontology journal January 2009

Cited By (4)

WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata journal January 2017
WikiGenomes: an open Web application for community consumption and curation of gene annotation data in Wikidata. audiovisual January 2017
WikiGenomes: an open Web application for community consumption and curation of gene annotation data in Wikidata. audiovisual January 2017
Proceedings of the 2010 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference journal October 2010