Next generation models for storage and representation of microbial biological annotation

Quest, Daniel J.; Land, Miriam L.; Brettin, Thomas S.; Cottingham, Robert W.

doi:10.1186/1471-2105-11-s6-s15

Title: Next generation models for storage and representation of microbial biological annotation

Journal Article · Fri Oct 01 00:00:00 EDT 2010 · BMC Bioinformatics

DOI:https://doi.org/10.1186/1471-2105-11-s6-s15· OSTI ID:1626276

Quest, Daniel J. ^[1]; Land, Miriam L. ^[1]; Brettin, Thomas S. ^[1]; Cottingham, Robert W. ^[1]

Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Biosciences Division

Background: Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation. Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software system to hardware and third party software (e.g. relational database systems and schemas). This makes annotation systems hard to reproduce, inflexible to modification over time, difficult to assess, difficult to partition across multiple geographic sites, and difficult to understand for those who are not domain experts. These systems are not readily open to scrutiny and therefore not scientifically tractable. The advent of Semantic Web standards such as Resource Description Framework (RDF) and OWL Web Ontology Language (OWL) enables us to construct systems that address these challenges in a new comprehensive way. Results: Here, we develop a framework for linking traditional data to OWL-based ontologies in genome annotation. We show how data standards can decouple hardware and third party software tools from annotation pipelines, thereby making annotation pipelines easier to reproduce and assess. An illustrative example shows how TURTLE (Terse RDF Triple Language) can be used as a human readable, but also semantically-aware, equivalent to GenBank/EMBL files. Conclusions: The power of this approach lies in its ability to assemble annotation data from multiple databases across multiple locations into a representation that is understandable to researchers. In this way, all researchers, experimental and computational, will more easily understand the informatics processes constructing genome annotation and ultimately be able to help improve the systems that produce them.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division

Grant/Contract Number:: AC05-00OR22725

OSTI ID:: 1626276

Journal Information:: BMC Bioinformatics, Vol. 11, Issue S6; ISSN 1471-2105

Publisher:: BioMed CentralCopyright Statement

Country of Publication:: United States

Language:: English

References (23)

Genome re-annotation: a wiki solution? Salzberg, Steven L. Genome Biology, Vol. 8, Issue 1 https://doi.org/10.1186/gb-2007-8-1-102	journal	January 2007
The Distributed Annotation System Dowell, Robin D.; Jokerst, Rodney M.; Day, Allen BMC Bioinformatics, Vol. 2, Issue 1, p. 7 https://doi.org/10.1186/1471-2105-2-7	journal	January 2001
The Distributed Annotation System for Integration of Biological Data Prlić, Andreas; Birney, Ewan; Cox, Tony Lecture Notes in Computer Science https://doi.org/10.1007/11799511_17	book	January 2006
Gene Ontology: tool for the unification of biology Ashburner, Michael; Ball, Catherine A.; Blake, Judith A. Nature Genetics, Vol. 25, Issue 1 https://doi.org/10.1038/75556	journal	May 2000
From SHIQ and RDF to OWL: the making of a Web Ontology Language Horrocks, Ian; Patel-Schneider, Peter F.; van Harmelen, Frank Journal of Web Semantics, Vol. 1, Issue 1 https://doi.org/10.1016/j.websem.2003.07.001	journal	December 2003
OWL 2: The next step for OWL Grau, Bernardo Cuenca; Horrocks, Ian; Motik, Boris Journal of Web Semantics, Vol. 6, Issue 4 https://doi.org/10.1016/j.websem.2008.05.001	journal	November 2008
SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services Gessler, Damian DG; Schiltz, Gary S.; May, Greg D. BMC Bioinformatics, Vol. 10, Issue 1 https://doi.org/10.1186/1471-2105-10-309	journal	September 2009
A Chado case study: an ontology-based modular schema for representing genome-associated biological information Mungall, Christopher J.; Emmert, David B. Bioinformatics, Vol. 23, Issue 13 https://doi.org/10.1093/bioinformatics/btm189	journal	July 2007
The Generic Genome Browser: A Building Block for a Model Organism System Database Stein, L. D. Genome Research, Vol. 12, Issue 10 https://doi.org/10.1101/gr.403602	journal	October 2002
The EMBL Nucleotide Sequence Database Stoesser, G. Nucleic Acids Research, Vol. 30, Issue 1 https://doi.org/10.1093/nar/30.1.21	journal	January 2002
RACER System Description Haarslev, Volker; Möller, Ralf Automated Reasoning https://doi.org/10.1007/3-540-45744-5_59	book	January 2001
KEGG: Kyoto Encyclopedia of Genes and Genomes Kanehisa, Minoru; Goto, Susumu Nucleic Acids Research, Vol. 28, Issue 1, p. 27-30 https://doi.org/10.1093/nar/28.1.27	journal	January 2000
Prodigal: prokaryotic gene recognition and translation initiation site identification Hyatt, Doug; Chen, Gwo-Liang; LoCascio, Philip F. BMC Bioinformatics, Vol. 11, Issue 1 https://doi.org/10.1186/1471-2105-11-119	journal	March 2010
Creating Semantic Web contents with Protege-2000 Noy, N. F.; Sintek, M.; Decker, S. IEEE Intelligent Systems, Vol. 16, Issue 2 https://doi.org/10.1109/5254.920601	journal	March 2001
The Bioperl Toolkit: Perl Modules for the Life Sciences Stajich, J. E. Genome Research, Vol. 12, Issue 10 https://doi.org/10.1101/gr.361602	journal	October 2002
Dietary palmitic acid promotes a prometastatic memory via Schwann cells Pascual, Gloria; Domínguez, Diana; Elosúa-Bayes, Marc Nature, Vol. 599, Issue 7885 https://doi.org/10.1038/s41586-021-04075-0	journal	November 2021
Red versus green leaves: transcriptomic comparison of foliar senescence between two Prunus cerasifera genotypes Vangelisti, Alberto; Guidi, Lucia; Cavallini, Andrea Scientific Reports, Vol. 10, Issue 1 https://doi.org/10.1038/s41598-020-58878-8	journal	February 2020
The Semantic Web Berners-Lee, Tim; Hendler, James; Lassila, Ora Scientific American, Vol. 284, Issue 5 https://doi.org/10.1038/scientificamerican0501-34	journal	May 2001
An Evidence Ontology for use in Pathway/Genome Databases Karp, P. D.; Paley, S.; Krieger, C. J. Biocomputing 2004 https://doi.org/10.1142/9789812704856_0019	conference	December 2003
Advancing translational research with the Semantic Web Ruttenberg, Alan; Clark, Tim; Bug, William BMC Bioinformatics, Vol. 8, Issue Suppl 3 https://doi.org/10.1186/1471-2105-8-s3-s2	journal	January 2007
Model storage, exchange and integration Le Novère, Nicolas BMC Neuroscience, Vol. 7, Issue S1 https://doi.org/10.1186/1471-2202-7-s1-s11	journal	October 2006
GMODWeb: a web framework for the generic model organism database O'Connor, Brian D.; Day, Allen; Cain, Scott Genome Biology, Vol. 9, Issue 6 https://doi.org/10.1186/gb-2008-9-6-r102	journal	January 2008
Initial Implementation of a Comparative Data Analysis Ontology Prosdocimi, Francisco; Chisham, Brandon; Pontelli, Enrico Evolutionary Bioinformatics, Vol. 5 https://doi.org/10.4137/ebo.s2320	journal	January 2009

Cited By (4)

WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata Putman, Tim E.; Lelong, Sebastien; Burgstaller-Muehlbacher, Sebastian Database, Vol. 2017 https://doi.org/10.1093/database/bax025	journal	January 2017
WikiGenomes: an open Web application for community consumption and curation of gene annotation data in Wikidata. Putman, Timothy; Lelong, Sebastien; Burgstaller-Muehlbacher, Sebastian figshare https://doi.org/10.6084/m9.figshare.5150065	audiovisual	January 2017
WikiGenomes: an open Web application for community consumption and curation of gene annotation data in Wikidata. Putman, Timothy; Lelong, Sebastien; Burgstaller-Muehlbacher, Sebastian figshare https://doi.org/10.6084/m9.figshare.5150065.v1	audiovisual	January 2017
Proceedings of the 2010 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference Wren, Jonathan D.; Kupfer, Doris M.; Perkins, Edward J. BMC Bioinformatics, Vol. 11, Issue S6 https://doi.org/10.1186/1471-2105-11-s6-s1	journal	October 2010

Similar Records

Next Generation Models for Storage and Representation of Microbial Biological Annotation

Conference · Fri Jan 01 00:00:00 EST 2010 · OSTI ID:1626276

Quest, Daniel J; Land, Miriam L; Brettin, Thomas S; +1 more

Publication and Retrieval of Computational Chemical-Physical Data Via the Semantic Web. Final Technical Report

Technical Report · Thu Jul 20 00:00:00 EDT 2017 · OSTI ID:1626276

Ostlund, Neil

Towards Cache-Enabled, Order-Aware, Ontology-Based Stream Reasoning Framework

Conference · Tue Aug 16 00:00:00 EDT 2016 · OSTI ID:1626276

Yan, Rui; Praggastis, Brenda L.; Smith, William P.; +1 more

Related Subjects

Biochemistry & Molecular Biology
Biotechnology & Applied Microbiology
Mathematical & Computational Biology

Title: Next generation models for storage and representation of microbial biological annotation

Citation Formats

References (23)

Cited By (4)

Similar Records

Related Subjects