skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation

Journal Article · · Journal of Biomedical Semantics
 [1];  [2];  [3];  [4];  [5];  [6];  [7];  [8];  [9];  [10];  [11]
  1. SIB Swiss Inst. of Bioinformatics, Geneva (Switzerland). Centre Medical Univ.
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Genomics Division
  3. Parco Tecnologico Padano (PTP), Lodi (Italy). Center for Research and the Study of Food and Agriculture (CeRSA)
  4. CODAMONO, Toronto, ON (Canada)
  5. Stanford Center for Biomedical Informatics Research, Stanford, CA (United States)
  6. National Inst. of Molecular Genetics (INGM), Milan (Italy). Integrative Biology Program
  7. Univ. of California, Berkeley, CA (United States)
  8. King Abdullah Univ. of Science and Technology, Thuwal (Saudi Arabia). Computer, Electrical and Mathematical Science and Engineering Division. Computational Bioscience Dept.
  9. National Inst. of Genetics, Shizouka (Japan). Research Organization of Information and Systems. Center for Information Biology
  10. Research Organization of Information and Systems, Tokyo (Japan). Database Center for Life Science
  11. The James Hutton Inst., Dundee, Scotland (United Kingdom)

© 2016 Bolleman et al. Background: Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. Description: We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Conclusions: Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.

Research Organization:
Univ. of California, Berkeley, CA (United States); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Basic Energy Sciences (BES)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1299144
Alternate ID(s):
OSTI ID: 1299149; OSTI ID: 1379397
Journal Information:
Journal of Biomedical Semantics, Vol. 7; ISSN 2041-1480
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 14 works
Citation information provided by
Web of Science

References (25)

The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications journal January 2011
The Bioperl Toolkit: Perl Modules for the Life Sciences journal October 2002
BioJava: an open-source framework for bioinformatics in 2012 journal August 2012
The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies journal January 2013
BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains journal January 2014
The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows journal January 2010
The RINGS Resource for Glycome Informatics Analysis and Data Mining on the Web journal August 2010
JBrowse: A next-generation genome browser journal July 2009
A Chado case study: an ontology-based modular schema for representing genome-associated biological information journal July 2007
BioRuby: bioinformatics software for the Ruby programming language journal August 2010
GenBank journal November 2012
DDBJ new system and service refactoring journal November 2012
An ontology based query engine for querying biological sequences journal October 2013
Biopython: freely available Python tools for computational molecular biology and bioinformatics journal March 2009
Facing growth in the European Nucleotide Archive journal November 2012
The terminal peptides of insulin journal January 1949
GLYCOSCIENCES.de: an Internet portal to support glycomics and glycobiology research journal May 2006
Bacterial Carbohydrate Structure Database 3: Principles and Realization journal December 2010
GlycomeDB--a unified database for carbohydrate structures journal November 2010
A standard variation file format for human genome sequences journal January 2010
UniCarbKB: building a knowledge platform for glycoproteomics journal November 2013
GFVO: the Genomic Feature and Variation Ontology journal January 2015
Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases journal January 2007
Spatiotemporal persistence of multiple, diverse clades and toxins of Corynebacterium diphtheriae journal March 2021
Targeted editing and evolution of engineered ribosomes in vivo by filtered editing journal January 2022

Cited By (9)

The Empusa code generator and its application to GBOL, an extendable ontology for genome annotation journal November 2019
YummyData: providing high-quality open life science data journal January 2018
HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes journal February 2020
DNA Data Bank of Japan journal October 2016
The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species journal November 2016
BioHackathon series in 2013 and 2014: improvements of semantic interoperability in life science data and services journal January 2019
TogoGenome/TogoStanza: modularized Semantic Web genome database journal January 2019
BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains journal January 2014
Preserving sequence annotations across reference sequences journal January 2014