FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation

Bolleman, Jerven T.; Mungall, Christopher J.; Strozzi, Francesco; Baran, Joachim; Dumontier, Michel; Bonnal, Raoul J.  P.; Buels, Robert; Hoehndorf, Robert; Fujisawa, Takatomo; Katayama, Toshiaki; Cock, Peter J.  A.

doi:10.1186/s13326-016-0067-z

Title: FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation

Journal Article · Mon Jun 13 00:00:00 EDT 2016 · Journal of Biomedical Semantics

DOI:https://doi.org/10.1186/s13326-016-0067-z· OSTI ID:1299144

Bolleman, Jerven T. ^[1]; Mungall, Christopher J. ^[2]; Strozzi, Francesco ^[3]; Baran, Joachim ^[4]; Dumontier, Michel ^[5]; Bonnal, Raoul J. P. ^[6]; Buels, Robert ^[7]; Hoehndorf, Robert ^[8]; Fujisawa, Takatomo ^[9]; Katayama, Toshiaki ^[10]; Cock, Peter J. A. ^[11]

SIB Swiss Inst. of Bioinformatics, Geneva (Switzerland). Centre Medical Univ.
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Genomics Division
Parco Tecnologico Padano (PTP), Lodi (Italy). Center for Research and the Study of Food and Agriculture (CeRSA)
CODAMONO, Toronto, ON (Canada)
Stanford Center for Biomedical Informatics Research, Stanford, CA (United States)
National Inst. of Molecular Genetics (INGM), Milan (Italy). Integrative Biology Program
Univ. of California, Berkeley, CA (United States)
King Abdullah Univ. of Science and Technology, Thuwal (Saudi Arabia). Computer, Electrical and Mathematical Science and Engineering Division. Computational Bioscience Dept.
National Inst. of Genetics, Shizouka (Japan). Research Organization of Information and Systems. Center for Information Biology
Research Organization of Information and Systems, Tokyo (Japan). Database Center for Life Science
The James Hutton Inst., Dundee, Scotland (United Kingdom)

© 2016 Bolleman et al. Background: Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. Description: We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Conclusions: Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Univ. of California, Berkeley, CA (United States); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Basic Energy Sciences (BES)

Grant/Contract Number:: AC02-05CH11231

OSTI ID:: 1299144

Alternate ID(s):: OSTI ID: 1299149; OSTI ID: 1379397

Journal Information:: Journal of Biomedical Semantics, Vol. 7; ISSN 2041-1480

Publisher:: BioMed CentralCopyright Statement

Country of Publication:: United States

Language:: English

Citation Metrics:

Cited by: 14 works

Citation information provided by
Web of Science

References (25)

The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications Katayama, Toshiaki; Wilkinson, Mark D.; Vos, Rutger Journal of Biomedical Semantics, Vol. 2, Issue 1 https://doi.org/10.1186/2041-1480-2-4	journal	January 2011
The Bioperl Toolkit: Perl Modules for the Life Sciences Stajich, J. E. Genome Research, Vol. 12, Issue 10 https://doi.org/10.1101/gr.361602	journal	October 2002
BioJava: an open-source framework for bioinformatics in 2012 Prlic, A.; Yates, A.; Bliven, S. E. Bioinformatics, Vol. 28, Issue 20 https://doi.org/10.1093/bioinformatics/bts494	journal	August 2012
The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies Katayama, Toshiaki; Wilkinson, Mark D.; Micklem, Gos Journal of Biomedical Semantics, Vol. 4, Issue 1 https://doi.org/10.1186/2041-1480-4-6	journal	January 2013
BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains Katayama, Toshiaki; Wilkinson, Mark D.; Aoki-Kinoshita, Kiyoko F. Journal of Biomedical Semantics, Vol. 5, Issue 1 https://doi.org/10.1186/2041-1480-5-5	journal	January 2014
The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows Katayama, Toshiaki; Arakawa, Kazuharu; Nakao, Mitsuteru Journal of Biomedical Semantics, Vol. 1, Issue 1 https://doi.org/10.1186/2041-1480-1-8	journal	January 2010
The RINGS Resource for Glycome Informatics Analysis and Data Mining on the Web Akune, Yukie; Hosoda, Masae; Kaiya, Sakiko OMICS: A Journal of Integrative Biology, Vol. 14, Issue 4 https://doi.org/10.1089/omi.2009.0129	journal	August 2010
JBrowse: A next-generation genome browser Skinner, M. E.; Uzilov, A. V.; Stein, L. D. Genome Research, Vol. 19, Issue 9 https://doi.org/10.1101/gr.094607.109	journal	July 2009
A Chado case study: an ontology-based modular schema for representing genome-associated biological information Mungall, Christopher J.; Emmert, David B. Bioinformatics, Vol. 23, Issue 13 https://doi.org/10.1093/bioinformatics/btm189	journal	July 2007
BioRuby: bioinformatics software for the Ruby programming language Goto, N.; Prins, P.; Nakao, M. Bioinformatics, Vol. 26, Issue 20 https://doi.org/10.1093/bioinformatics/btq475	journal	August 2010
GenBank Benson, Dennis A.; Cavanaugh, Mark; Clark, Karen Nucleic Acids Research, Vol. 41, Issue D1 https://doi.org/10.1093/nar/gks1195	journal	November 2012
DDBJ new system and service refactoring Ogasawara, Osamu; Mashima, Jun; Kodama, Yuichi Nucleic Acids Research, Vol. 41, Issue D1 https://doi.org/10.1093/nar/gks1152	journal	November 2012
An ontology based query engine for querying biological sequences Devisscher, Martijn; De Meyer, Tim; Van Criekinge, Wim EMBnet.journal, Vol. 19, Issue B https://doi.org/10.14806/ej.19.B.729	journal	October 2013
Biopython: freely available Python tools for computational molecular biology and bioinformatics Cock, P. J. A.; Antao, T.; Chang, J. T. Bioinformatics, Vol. 25, Issue 11 https://doi.org/10.1093/bioinformatics/btp163	journal	March 2009
Facing growth in the European Nucleotide Archive Cochrane, Guy; Alako, Blaise; Amid, Clara Nucleic Acids Research, Vol. 41, Issue D1 https://doi.org/10.1093/nar/gks1175	journal	November 2012
The terminal peptides of insulin Sanger, F. Biochemical Journal, Vol. 45, Issue 5 https://doi.org/10.1042/bj0450563	journal	January 1949
GLYCOSCIENCES.de: an Internet portal to support glycomics and glycobiology research Lütteke, Thomas; Bohne-Lang, Andreas; Loss, Alexander Glycobiology, Vol. 16, Issue 5 https://doi.org/10.1093/glycob/cwj049	journal	May 2006
Bacterial Carbohydrate Structure Database 3: Principles and Realization Toukach, Philip V. Journal of Chemical Information and Modeling, Vol. 51, Issue 1 https://doi.org/10.1021/ci100150d	journal	December 2010
GlycomeDB--a unified database for carbohydrate structures Ranzinger, R.; Herget, S.; von der Lieth, C. -W. Nucleic Acids Research, Vol. 39, Issue Database https://doi.org/10.1093/nar/gkq1014	journal	November 2010
A standard variation file format for human genome sequences Reese, Martin G.; Moore, Barry; Batchelor, Colin Genome Biology, Vol. 11, Issue 8 https://doi.org/10.1186/gb-2010-11-8-r88	journal	January 2010
UniCarbKB: building a knowledge platform for glycoproteomics Campbell, Matthew P.; Peterson, Robyn; Mariethoz, Julien Nucleic Acids Research, Vol. 42, Issue D1 https://doi.org/10.1093/nar/gkt1128	journal	November 2013
GFVO: the Genomic Feature and Variation Ontology Baran, Joachim; Durgahee, Bibi Sehnaaz Begum; Eilbeck, Karen PeerJ, Vol. 3 https://doi.org/10.7717/peerj.933	journal	January 2015
Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases Alekseyenko, A. V.; Lee, C. J. Bioinformatics, Vol. 23, Issue 11 https://doi.org/10.1093/bioinformatics/btl647	journal	January 2007
Spatiotemporal persistence of multiple, diverse clades and toxins of Corynebacterium diphtheriae Will, Robert C.; Ramamurthy, Thandavarayan; Sharma, Naresh Chand Nature Communications, Vol. 12, Issue 1 https://doi.org/10.1038/s41467-021-21870-5	journal	March 2021
Targeted editing and evolution of engineered ribosomes in vivo by filtered editing Radford, Felix; Elliott, Shane D.; Schepartz, Alanna Nature Communications, Vol. 13, Issue 1 https://doi.org/10.1038/s41467-021-27836-x	journal	January 2022

Cited By (9)

The Empusa code generator and its application to GBOL, an extendable ontology for genome annotation van Dam, Jesse C. J.; Koehorst, Jasper J.; Vik, Jon Olav Scientific Data, Vol. 6, Issue 1 https://doi.org/10.1038/s41597-019-0263-7	journal	November 2019
YummyData: providing high-quality open life science data Yamamoto, Yasunori; Yamaguchi, Atsuko; Splendiani, Andrea Database, Vol. 2018 https://doi.org/10.1093/database/bay022	journal	January 2018
HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes Bolleman, Jerven; de Castro, Edouard; Baratin, Delphine GigaScience, Vol. 9, Issue 2 https://doi.org/10.1093/gigascience/giaa003	journal	February 2020
DNA Data Bank of Japan Mashima, Jun; Kodama, Yuichi; Fujisawa, Takatomo Nucleic Acids Research, Vol. 45, Issue D1 https://doi.org/10.1093/nar/gkw1001	journal	October 2016
The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species Mungall, Christopher J.; McMurry, Julie A.; Köhler, Sebastian Nucleic Acids Research, Vol. 45, Issue D1 https://doi.org/10.1093/nar/gkw1128	journal	November 2016
BioHackathon series in 2013 and 2014: improvements of semantic interoperability in life science data and services Katayama, Toshiaki; Kawashima, Shuichi; Micklem, Gos F1000Research, Vol. 8 https://doi.org/10.12688/f1000research.18238.1	journal	January 2019
TogoGenome/TogoStanza: modularized Semantic Web genome database Katayama, Toshiaki; Kawashima, Shuichi; Okamoto, Shinobu Database, Vol. 2019 https://doi.org/10.1093/database/bay132	journal	January 2019
BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains Katayama, Toshiaki; Wilkinson, Mark D.; Aoki-Kinoshita, Kiyoko F. Journal of Biomedical Semantics, Vol. 5, Issue 1 https://doi.org/10.1186/2041-1480-5-5	journal	January 2014
Preserving sequence annotations across reference sequences Tatum, Zuotian; Roos, Marco; Gibson, Andrew P. Journal of Biomedical Semantics, Vol. 5, Issue Suppl 1 https://doi.org/10.1186/2041-1480-5-s1-s6	journal	January 2014

Similar Records

The NamesforLife Semantic Index of Phenotypic and Genotypic Data for Systems Biology

Technical Report · Tue Aug 28 00:00:00 EDT 2018 · OSTI ID:1299144

Garrity, George M.; Parker, Charles T.

BioWarehouse: a bioinformatics database warehouse toolkit

Journal Article · Thu Mar 23 00:00:00 EST 2006 · BMC Bioinformatics · OSTI ID:1299144

Lee, Thomas J.; Pouliot, Yannick; Wagner, Valerie; +4 more

Ontobee: A linked ontology data server to support ontology term dereferencing, linkage, query and integration

Journal Article · Wed Oct 12 00:00:00 EDT 2016 · Nucleic Acids Research · OSTI ID:1299144

Ong, Edison; Xiang, Zuoshuang; Zhao, Bin; +7 more

Related Subjects

60 APPLIED LIFE SCIENCES
59 BASIC BIOLOGICAL SCIENCES
SPARQL
RDF
Semantic Web
Standardisation
Sequence ontology
Annotation
Data integration
Sequence feature
96 KNOWLEDGE MANAGEMENT AND PRESERVATION
semantic web
standardisation
sequence ontology
annotation
data integration
sequence feature

Title: FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation

Citation Formats

References (25)

Cited By (9)

Similar Records

Related Subjects