skip to main content

SciTech ConnectSciTech Connect

Title: FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation

Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. In this paper, we have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Finally, data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.
 [1] ;  [2] ;  [3] ;  [4] ;  [5] ;  [6] ;  [7] ;  [8] ;  [9] ;  [10] ;  [11]
  1. SIB Swiss Inst. of Bioinformatics, Geneva (Switzerland). Centre Medical Univ.
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Genomics Division
  3. Parco Tecnologico Padano (PTP), Lodi (Italy). Center for Research and the Study of Food and Agriculture (CeRSA)
  4. CODAMONO, Toronto, ON (Canada)
  5. Stanford Center for Biomedical Informatics Research, Stanford, CA (United States)
  6. National Inst. of Molecular Genetics (INGM), Milan (Italy). Integrative Biology Program
  7. Univ. of California, Berkeley, CA (United States)
  8. King Abdullah Univ. of Science and Technology, Thuwal (Saudi Arabia). Computer, Electrical and Mathematical Science and Engineering Division. Computational Bioscience Dept.
  9. National Inst. of Genetics, Shizouka (Japan). Research Organization of Information and Systems. Center for Information Biology
  10. Research Organization of Information and Systems, Tokyo (Japan). Database Center for Life Science
  11. The James Hutton Inst., Dundee, Scotland (United Kingdom)
Publication Date:
OSTI Identifier:
Grant/Contract Number:
Accepted Manuscript
Journal Name:
Journal of Biomedical Semantics
Additional Journal Information:
Journal Volume: 7; Journal ID: ISSN 2041-1480
BioMed Central
Research Org:
Univ. of California, Berkeley, CA (United States)
Sponsoring Org:
USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22)
Country of Publication:
United States
SPARQL; RDF; Semantic Web; Standardisation; Sequence ontology; Annotation; Data integration; Sequence feature