Next Generation Models for Storage and Representation of Microbial Biological Annotation

Quest, Daniel J; Land, Miriam L; Brettin, Thomas S; Cottingham, Robert W

Title: Next Generation Models for Storage and Representation of Microbial Biological Annotation

Conference · Fri Jan 01 00:00:00 EST 2010

OSTI ID:993783

Quest, Daniel J ^[1]; Land, Miriam L ^[1]; Brettin, Thomas S ^[1]; Cottingham, Robert W ^[1]

ORNL

Background Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation. Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software system to hardware and third party software (e.g. relational database systems and schemas). This makes annotation systems hard to reproduce, inflexible to modification over time, difficult to assess, difficult to partition across multiple geographic sites, and difficult to understand for those who are not domain experts. These systems are not readily open to scrutiny and therefore not scientifically tractable. The advent of Semantic Web standards such as Resource Description Framework (RDF) and OWL Web Ontology Language (OWL) enables us to construct systems that address these challenges in a new comprehensive way. Results Here, we develop a framework for linking traditional data to OWL-based ontologies in genome annotation. We show how data standards can decouple hardware and third party software tools from annotation pipelines, thereby making annotation pipelines easier to reproduce and assess. An illustrative example shows how TURTLE (Terse RDF Triple Language) can be used as a human readable, but also semantically-aware, equivalent to GenBank/EMBL files. Conclusions The power of this approach lies in its ability to assemble annotation data from multiple databases across multiple locations into a representation that is understandable to researchers. In this way, all researchers, experimental and computational, will more easily understand the informatics processes constructing genome annotation and ultimately be able to help improve the systems that produce them.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE Laboratory Directed Research and Development (LDRD) Program; Work for Others (WFO); USDOE Office of Science (SC)

DOE Contract Number:: DE-AC05-00OR22725

OSTI ID:: 993783

Resource Relation:: Conference: MCBIOS, Jonesboro, AR, USA, 20100219, 20100219

Country of Publication:: United States

Language:: English

Similar Records

Next generation models for storage and representation of microbial biological annotation

Journal Article · Fri Oct 01 00:00:00 EDT 2010 · BMC Bioinformatics · OSTI ID:993783

Quest, Daniel J.; Land, Miriam L.; Brettin, Thomas S.; +1 more

Publication and Retrieval of Computational Chemical-Physical Data Via the Semantic Web. Final Technical Report

Technical Report · Thu Jul 20 00:00:00 EDT 2017 · OSTI ID:993783

Ostlund, Neil

Towards Cache-Enabled, Order-Aware, Ontology-Based Stream Reasoning Framework

Conference · Tue Aug 16 00:00:00 EDT 2016 · OSTI ID:993783

Yan, Rui; Praggastis, Brenda L.; Smith, William P.; +1 more

Related Subjects

59 BASIC BIOLOGICAL SCIENCES
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
GENES
MODIFICATIONS
PIPELINES
STORAGE
COMPUTERS
PROGRAMMING LANGUAGES

Title: Next Generation Models for Storage and Representation of Microbial Biological Annotation

Citation Formats

Similar Records

Related Subjects