skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation

Abstract

© 2016 Bolleman et al. Background: Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. Description: We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Conclusions: Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.

Authors:
 [1];  [2];  [3];  [4];  [5];  [6];  [7];  [8];  [9];  [10];  [11]
  1. SIB Swiss Inst. of Bioinformatics, Geneva (Switzerland). Centre Medical Univ.
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Genomics Division
  3. Parco Tecnologico Padano (PTP), Lodi (Italy). Center for Research and the Study of Food and Agriculture (CeRSA)
  4. CODAMONO, Toronto, ON (Canada)
  5. Stanford Center for Biomedical Informatics Research, Stanford, CA (United States)
  6. National Inst. of Molecular Genetics (INGM), Milan (Italy). Integrative Biology Program
  7. Univ. of California, Berkeley, CA (United States)
  8. King Abdullah Univ. of Science and Technology, Thuwal (Saudi Arabia). Computer, Electrical and Mathematical Science and Engineering Division. Computational Bioscience Dept.
  9. National Inst. of Genetics, Shizouka (Japan). Research Organization of Information and Systems. Center for Information Biology
  10. Research Organization of Information and Systems, Tokyo (Japan). Database Center for Life Science
  11. The James Hutton Inst., Dundee, Scotland (United Kingdom)
Publication Date:
Research Org.:
Univ. of California, Berkeley, CA (United States); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES)
OSTI Identifier:
1299144
Alternate Identifier(s):
OSTI ID: 1299149; OSTI ID: 1379397
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Journal of Biomedical Semantics
Additional Journal Information:
Journal Volume: 7; Journal ID: ISSN 2041-1480
Publisher:
BioMed Central
Country of Publication:
United States
Language:
English
Subject:
60 APPLIED LIFE SCIENCES; 59 BASIC BIOLOGICAL SCIENCES; SPARQL; RDF; Semantic Web; Standardisation; Sequence ontology; Annotation; Data integration; Sequence feature; 96 KNOWLEDGE MANAGEMENT AND PRESERVATION; semantic web; standardisation; sequence ontology; annotation; data integration; sequence feature

Citation Formats

Bolleman, Jerven T., Mungall, Christopher J., Strozzi, Francesco, Baran, Joachim, Dumontier, Michel, Bonnal, Raoul J. P., Buels, Robert, Hoehndorf, Robert, Fujisawa, Takatomo, Katayama, Toshiaki, and Cock, Peter J. A. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation. United States: N. p., 2016. Web. doi:10.1186/s13326-016-0067-z.
Bolleman, Jerven T., Mungall, Christopher J., Strozzi, Francesco, Baran, Joachim, Dumontier, Michel, Bonnal, Raoul J. P., Buels, Robert, Hoehndorf, Robert, Fujisawa, Takatomo, Katayama, Toshiaki, & Cock, Peter J. A. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation. United States. https://doi.org/10.1186/s13326-016-0067-z
Bolleman, Jerven T., Mungall, Christopher J., Strozzi, Francesco, Baran, Joachim, Dumontier, Michel, Bonnal, Raoul J. P., Buels, Robert, Hoehndorf, Robert, Fujisawa, Takatomo, Katayama, Toshiaki, and Cock, Peter J. A. 2016. "FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation". United States. https://doi.org/10.1186/s13326-016-0067-z. https://www.osti.gov/servlets/purl/1299144.
@article{osti_1299144,
title = {FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation},
author = {Bolleman, Jerven T. and Mungall, Christopher J. and Strozzi, Francesco and Baran, Joachim and Dumontier, Michel and Bonnal, Raoul J. P. and Buels, Robert and Hoehndorf, Robert and Fujisawa, Takatomo and Katayama, Toshiaki and Cock, Peter J. A.},
abstractNote = {© 2016 Bolleman et al. Background: Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. Description: We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Conclusions: Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.},
doi = {10.1186/s13326-016-0067-z},
url = {https://www.osti.gov/biblio/1299144}, journal = {Journal of Biomedical Semantics},
issn = {2041-1480},
number = ,
volume = 7,
place = {United States},
year = {Mon Jun 13 00:00:00 EDT 2016},
month = {Mon Jun 13 00:00:00 EDT 2016}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 14 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications
journal, January 2011


The Bioperl Toolkit: Perl Modules for the Life Sciences
journal, October 2002


BioJava: an open-source framework for bioinformatics in 2012
journal, August 2012


The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies
journal, January 2013


BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains
journal, January 2014


The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows
journal, January 2010


The RINGS Resource for Glycome Informatics Analysis and Data Mining on the Web
journal, August 2010


JBrowse: A next-generation genome browser
journal, July 2009


BioRuby: bioinformatics software for the Ruby programming language
journal, August 2010


GenBank
journal, November 2012


DDBJ new system and service refactoring
journal, November 2012


An ontology based query engine for querying biological sequences
journal, October 2013


Biopython: freely available Python tools for computational molecular biology and bioinformatics
journal, March 2009


Facing growth in the European Nucleotide Archive
journal, November 2012


The terminal peptides of insulin
journal, January 1949


GLYCOSCIENCES.de: an Internet portal to support glycomics and glycobiology research
journal, May 2006


Bacterial Carbohydrate Structure Database 3: Principles and Realization
journal, December 2010


GlycomeDB--a unified database for carbohydrate structures
journal, November 2010


A standard variation file format for human genome sequences
journal, January 2010


UniCarbKB: building a knowledge platform for glycoproteomics
journal, November 2013


GFVO: the Genomic Feature and Variation Ontology
journal, January 2015


Spatiotemporal persistence of multiple, diverse clades and toxins of Corynebacterium diphtheriae
journal, March 2021


Bacterial Carbohydrate Structure Database 3: Principles and Realization
journal, December 2010


Targeted editing and evolution of engineered ribosomes in vivo by filtered editing
journal, January 2022


The terminal peptides of insulin
journal, January 1949


The RINGS Resource for Glycome Informatics Analysis and Data Mining on the Web
journal, August 2010


Biopython: freely available Python tools for computational molecular biology and bioinformatics
journal, March 2009


BioRuby: bioinformatics software for the Ruby programming language
journal, August 2010


BioJava: an open-source framework for bioinformatics in 2012
journal, August 2012


GlycomeDB--a unified database for carbohydrate structures
journal, November 2010


DDBJ new system and service refactoring
journal, November 2012


GenBank
journal, November 2012


JBrowse: A next-generation genome browser
journal, July 2009


The Bioperl Toolkit: Perl Modules for the Life Sciences
journal, October 2002


The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows
journal, January 2010


The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications
journal, January 2011


BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains
journal, January 2014


A standard variation file format for human genome sequences
journal, January 2010


Works referencing / citing this record:

The Empusa code generator and its application to GBOL, an extendable ontology for genome annotation
journal, November 2019


YummyData: providing high-quality open life science data
journal, January 2018


HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes
journal, February 2020


DNA Data Bank of Japan
journal, October 2016


The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species
journal, November 2016


TogoGenome/TogoStanza: modularized Semantic Web genome database
journal, January 2019


HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes
journal, February 2020


BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains
journal, January 2014


Preserving sequence annotations across reference sequences
journal, January 2014