skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation

Abstract

Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. In this paper, we have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Finally, data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.

Authors:
 [1];  [2];  [3];  [4];  [5];  [6];  [7];  [8];  [9];  [10];  [11]
  1. SIB Swiss Inst. of Bioinformatics, Geneva (Switzerland). Centre Medical Univ.
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Genomics Division
  3. Parco Tecnologico Padano (PTP), Lodi (Italy). Center for Research and the Study of Food and Agriculture (CeRSA)
  4. CODAMONO, Toronto, ON (Canada)
  5. Stanford Center for Biomedical Informatics Research, Stanford, CA (United States)
  6. National Inst. of Molecular Genetics (INGM), Milan (Italy). Integrative Biology Program
  7. Univ. of California, Berkeley, CA (United States)
  8. King Abdullah Univ. of Science and Technology, Thuwal (Saudi Arabia). Computer, Electrical and Mathematical Science and Engineering Division. Computational Bioscience Dept.
  9. National Inst. of Genetics, Shizouka (Japan). Research Organization of Information and Systems. Center for Information Biology
  10. Research Organization of Information and Systems, Tokyo (Japan). Database Center for Life Science
  11. The James Hutton Inst., Dundee, Scotland (United Kingdom)
Publication Date:
Research Org.:
Univ. of California, Berkeley, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22)
OSTI Identifier:
1299144
Alternate Identifier(s):
OSTI ID: 1299149; OSTI ID: 1379397
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
Journal of Biomedical Semantics
Additional Journal Information:
Journal Volume: 7; Journal ID: ISSN 2041-1480
Publisher:
BioMed Central
Country of Publication:
United States
Language:
English
Subject:
60 APPLIED LIFE SCIENCES; 59 BASIC BIOLOGICAL SCIENCES; SPARQL; RDF; Semantic Web; Standardisation; Sequence ontology; Annotation; Data integration; Sequence feature; 96 KNOWLEDGE MANAGEMENT AND PRESERVATION; semantic web; standardisation; sequence ontology; annotation; data integration; sequence feature

Citation Formats

Bolleman, Jerven T., Mungall, Christopher J., Strozzi, Francesco, Baran, Joachim, Dumontier, Michel, Bonnal, Raoul J. P., Buels, Robert, Hoehndorf, Robert, Fujisawa, Takatomo, Katayama, Toshiaki, and Cock, Peter J. A. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation. United States: N. p., 2016. Web. doi:10.1186/s13326-016-0067-z.
Bolleman, Jerven T., Mungall, Christopher J., Strozzi, Francesco, Baran, Joachim, Dumontier, Michel, Bonnal, Raoul J. P., Buels, Robert, Hoehndorf, Robert, Fujisawa, Takatomo, Katayama, Toshiaki, & Cock, Peter J. A. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation. United States. doi:10.1186/s13326-016-0067-z.
Bolleman, Jerven T., Mungall, Christopher J., Strozzi, Francesco, Baran, Joachim, Dumontier, Michel, Bonnal, Raoul J. P., Buels, Robert, Hoehndorf, Robert, Fujisawa, Takatomo, Katayama, Toshiaki, and Cock, Peter J. A. Mon . "FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation". United States. doi:10.1186/s13326-016-0067-z. https://www.osti.gov/servlets/purl/1299144.
@article{osti_1299144,
title = {FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation},
author = {Bolleman, Jerven T. and Mungall, Christopher J. and Strozzi, Francesco and Baran, Joachim and Dumontier, Michel and Bonnal, Raoul J. P. and Buels, Robert and Hoehndorf, Robert and Fujisawa, Takatomo and Katayama, Toshiaki and Cock, Peter J. A.},
abstractNote = {Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. In this paper, we have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Finally, data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.},
doi = {10.1186/s13326-016-0067-z},
journal = {Journal of Biomedical Semantics},
number = ,
volume = 7,
place = {United States},
year = {2016},
month = {6}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 4 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications
journal, January 2011

  • Katayama, Toshiaki; Wilkinson, Mark D.; Vos, Rutger
  • Journal of Biomedical Semantics, Vol. 2, Issue 1
  • DOI: 10.1186/2041-1480-2-4

The Bioperl Toolkit: Perl Modules for the Life Sciences
journal, October 2002


BioJava: an open-source framework for bioinformatics in 2012
journal, August 2012


The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies
journal, January 2013

  • Katayama, Toshiaki; Wilkinson, Mark D.; Micklem, Gos
  • Journal of Biomedical Semantics, Vol. 4, Issue 1
  • DOI: 10.1186/2041-1480-4-6

BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains
journal, January 2014

  • Katayama, Toshiaki; Wilkinson, Mark D.; Aoki-Kinoshita, Kiyoko F.
  • Journal of Biomedical Semantics, Vol. 5, Issue 1
  • DOI: 10.1186/2041-1480-5-5

The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows
journal, January 2010

  • Katayama, Toshiaki; Arakawa, Kazuharu; Nakao, Mitsuteru
  • Journal of Biomedical Semantics, Vol. 1, Issue 1
  • DOI: 10.1186/2041-1480-1-8

The RINGS Resource for Glycome Informatics Analysis and Data Mining on the Web
journal, August 2010

  • Akune, Yukie; Hosoda, Masae; Kaiya, Sakiko
  • OMICS: A Journal of Integrative Biology, Vol. 14, Issue 4
  • DOI: 10.1089/omi.2009.0129

JBrowse: A next-generation genome browser
journal, July 2009

  • Skinner, M. E.; Uzilov, A. V.; Stein, L. D.
  • Genome Research, Vol. 19, Issue 9
  • DOI: 10.1101/gr.094607.109

A Chado case study: an ontology-based modular schema for representing genome-associated biological information
journal, July 2007


BioRuby: bioinformatics software for the Ruby programming language
journal, August 2010


GenBank
journal, November 2012

  • Benson, Dennis A.; Cavanaugh, Mark; Clark, Karen
  • Nucleic Acids Research, Vol. 41, Issue D1
  • DOI: 10.1093/nar/gks1195

DDBJ new system and service refactoring
journal, November 2012

  • Ogasawara, Osamu; Mashima, Jun; Kodama, Yuichi
  • Nucleic Acids Research, Vol. 41, Issue D1
  • DOI: 10.1093/nar/gks1152

An ontology based query engine for querying biological sequences
journal, October 2013

  • Devisscher, Martijn; De Meyer, Tim; Van Criekinge, Wim
  • EMBnet.journal, Vol. 19, Issue B
  • DOI: 10.14806/ej.19.B.729

Biopython: freely available Python tools for computational molecular biology and bioinformatics
journal, March 2009


Facing growth in the European Nucleotide Archive
journal, November 2012

  • Cochrane, Guy; Alako, Blaise; Amid, Clara
  • Nucleic Acids Research, Vol. 41, Issue D1
  • DOI: 10.1093/nar/gks1175

The terminal peptides of insulin
journal, January 1949


GLYCOSCIENCES.de: an Internet portal to support glycomics and glycobiology research
journal, May 2006

  • Lütteke, Thomas; Bohne-Lang, Andreas; Loss, Alexander
  • Glycobiology, Vol. 16, Issue 5
  • DOI: 10.1093/glycob/cwj049

Bacterial Carbohydrate Structure Database 3: Principles and Realization
journal, December 2010

  • Toukach, Philip V.
  • Journal of Chemical Information and Modeling, Vol. 51, Issue 1
  • DOI: 10.1021/ci100150d

GlycomeDB--a unified database for carbohydrate structures
journal, November 2010

  • Ranzinger, R.; Herget, S.; von der Lieth, C. -W.
  • Nucleic Acids Research, Vol. 39, Issue Database
  • DOI: 10.1093/nar/gkq1014

A standard variation file format for human genome sequences
journal, January 2010


UniCarbKB: building a knowledge platform for glycoproteomics
journal, November 2013

  • Campbell, Matthew P.; Peterson, Robyn; Mariethoz, Julien
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1128

GFVO: the Genomic Feature and Variation Ontology
journal, January 2015

  • Baran, Joachim; Durgahee, Bibi Sehnaaz Begum; Eilbeck, Karen
  • PeerJ, Vol. 3
  • DOI: 10.7717/peerj.933

    Works referencing / citing this record:

    BioHackathon series in 2013 and 2014: improvements of semantic interoperability in life science data and services
    journal, January 2019


    BioHackathon series in 2013 and 2014: improvements of semantic interoperability in life science data and services
    journal, January 2019