FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation
Abstract
© 2016 Bolleman et al. Background: Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. Description: We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Conclusions: Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.
- Authors:
-
- SIB Swiss Inst. of Bioinformatics, Geneva (Switzerland). Centre Medical Univ.
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Genomics Division
- Parco Tecnologico Padano (PTP), Lodi (Italy). Center for Research and the Study of Food and Agriculture (CeRSA)
- CODAMONO, Toronto, ON (Canada)
- Stanford Center for Biomedical Informatics Research, Stanford, CA (United States)
- National Inst. of Molecular Genetics (INGM), Milan (Italy). Integrative Biology Program
- Univ. of California, Berkeley, CA (United States)
- King Abdullah Univ. of Science and Technology, Thuwal (Saudi Arabia). Computer, Electrical and Mathematical Science and Engineering Division. Computational Bioscience Dept.
- National Inst. of Genetics, Shizouka (Japan). Research Organization of Information and Systems. Center for Information Biology
- Research Organization of Information and Systems, Tokyo (Japan). Database Center for Life Science
- The James Hutton Inst., Dundee, Scotland (United Kingdom)
- Publication Date:
- Research Org.:
- Univ. of California, Berkeley, CA (United States); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Basic Energy Sciences (BES)
- OSTI Identifier:
- 1299144
- Alternate Identifier(s):
- OSTI ID: 1299149; OSTI ID: 1379397
- Grant/Contract Number:
- AC02-05CH11231
- Resource Type:
- Journal Article: Accepted Manuscript
- Journal Name:
- Journal of Biomedical Semantics
- Additional Journal Information:
- Journal Volume: 7; Journal ID: ISSN 2041-1480
- Publisher:
- BioMed Central
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 60 APPLIED LIFE SCIENCES; 59 BASIC BIOLOGICAL SCIENCES; SPARQL; RDF; Semantic Web; Standardisation; Sequence ontology; Annotation; Data integration; Sequence feature; 96 KNOWLEDGE MANAGEMENT AND PRESERVATION; semantic web; standardisation; sequence ontology; annotation; data integration; sequence feature
Citation Formats
Bolleman, Jerven T., Mungall, Christopher J., Strozzi, Francesco, Baran, Joachim, Dumontier, Michel, Bonnal, Raoul J. P., Buels, Robert, Hoehndorf, Robert, Fujisawa, Takatomo, Katayama, Toshiaki, and Cock, Peter J. A. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation. United States: N. p., 2016.
Web. doi:10.1186/s13326-016-0067-z.
Bolleman, Jerven T., Mungall, Christopher J., Strozzi, Francesco, Baran, Joachim, Dumontier, Michel, Bonnal, Raoul J. P., Buels, Robert, Hoehndorf, Robert, Fujisawa, Takatomo, Katayama, Toshiaki, & Cock, Peter J. A. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation. United States. https://doi.org/10.1186/s13326-016-0067-z
Bolleman, Jerven T., Mungall, Christopher J., Strozzi, Francesco, Baran, Joachim, Dumontier, Michel, Bonnal, Raoul J. P., Buels, Robert, Hoehndorf, Robert, Fujisawa, Takatomo, Katayama, Toshiaki, and Cock, Peter J. A. 2016.
"FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation". United States. https://doi.org/10.1186/s13326-016-0067-z. https://www.osti.gov/servlets/purl/1299144.
@article{osti_1299144,
title = {FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation},
author = {Bolleman, Jerven T. and Mungall, Christopher J. and Strozzi, Francesco and Baran, Joachim and Dumontier, Michel and Bonnal, Raoul J. P. and Buels, Robert and Hoehndorf, Robert and Fujisawa, Takatomo and Katayama, Toshiaki and Cock, Peter J. A.},
abstractNote = {© 2016 Bolleman et al. Background: Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. Description: We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Conclusions: Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.},
doi = {10.1186/s13326-016-0067-z},
url = {https://www.osti.gov/biblio/1299144},
journal = {Journal of Biomedical Semantics},
issn = {2041-1480},
number = ,
volume = 7,
place = {United States},
year = {Mon Jun 13 00:00:00 EDT 2016},
month = {Mon Jun 13 00:00:00 EDT 2016}
}
Web of Science
Works referenced in this record:
The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications
journal, January 2011
- Katayama, Toshiaki; Wilkinson, Mark D.; Vos, Rutger
- Journal of Biomedical Semantics, Vol. 2, Issue 1
The Bioperl Toolkit: Perl Modules for the Life Sciences
journal, October 2002
- Stajich, J. E.
- Genome Research, Vol. 12, Issue 10
BioJava: an open-source framework for bioinformatics in 2012
journal, August 2012
- Prlic, A.; Yates, A.; Bliven, S. E.
- Bioinformatics, Vol. 28, Issue 20
The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies
journal, January 2013
- Katayama, Toshiaki; Wilkinson, Mark D.; Micklem, Gos
- Journal of Biomedical Semantics, Vol. 4, Issue 1
BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains
journal, January 2014
- Katayama, Toshiaki; Wilkinson, Mark D.; Aoki-Kinoshita, Kiyoko F.
- Journal of Biomedical Semantics, Vol. 5, Issue 1
The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows
journal, January 2010
- Katayama, Toshiaki; Arakawa, Kazuharu; Nakao, Mitsuteru
- Journal of Biomedical Semantics, Vol. 1, Issue 1
The RINGS Resource for Glycome Informatics Analysis and Data Mining on the Web
journal, August 2010
- Akune, Yukie; Hosoda, Masae; Kaiya, Sakiko
- OMICS: A Journal of Integrative Biology, Vol. 14, Issue 4
JBrowse: A next-generation genome browser
journal, July 2009
- Skinner, M. E.; Uzilov, A. V.; Stein, L. D.
- Genome Research, Vol. 19, Issue 9
A Chado case study: an ontology-based modular schema for representing genome-associated biological information
journal, July 2007
- Mungall, Christopher J.; Emmert, David B.
- Bioinformatics, Vol. 23, Issue 13
BioRuby: bioinformatics software for the Ruby programming language
journal, August 2010
- Goto, N.; Prins, P.; Nakao, M.
- Bioinformatics, Vol. 26, Issue 20
GenBank
journal, November 2012
- Benson, Dennis A.; Cavanaugh, Mark; Clark, Karen
- Nucleic Acids Research, Vol. 41, Issue D1
DDBJ new system and service refactoring
journal, November 2012
- Ogasawara, Osamu; Mashima, Jun; Kodama, Yuichi
- Nucleic Acids Research, Vol. 41, Issue D1
An ontology based query engine for querying biological sequences
journal, October 2013
- Devisscher, Martijn; De Meyer, Tim; Van Criekinge, Wim
- EMBnet.journal, Vol. 19, Issue B
Biopython: freely available Python tools for computational molecular biology and bioinformatics
journal, March 2009
- Cock, P. J. A.; Antao, T.; Chang, J. T.
- Bioinformatics, Vol. 25, Issue 11
Facing growth in the European Nucleotide Archive
journal, November 2012
- Cochrane, Guy; Alako, Blaise; Amid, Clara
- Nucleic Acids Research, Vol. 41, Issue D1
The terminal peptides of insulin
journal, January 1949
- Sanger, F.
- Biochemical Journal, Vol. 45, Issue 5
GLYCOSCIENCES.de: an Internet portal to support glycomics and glycobiology research
journal, May 2006
- Lütteke, Thomas; Bohne-Lang, Andreas; Loss, Alexander
- Glycobiology, Vol. 16, Issue 5
Bacterial Carbohydrate Structure Database 3: Principles and Realization
journal, December 2010
- Toukach, Philip V.
- Journal of Chemical Information and Modeling, Vol. 51, Issue 1
GlycomeDB--a unified database for carbohydrate structures
journal, November 2010
- Ranzinger, R.; Herget, S.; von der Lieth, C. -W.
- Nucleic Acids Research, Vol. 39, Issue Database
A standard variation file format for human genome sequences
journal, January 2010
- Reese, Martin G.; Moore, Barry; Batchelor, Colin
- Genome Biology, Vol. 11, Issue 8
UniCarbKB: building a knowledge platform for glycoproteomics
journal, November 2013
- Campbell, Matthew P.; Peterson, Robyn; Mariethoz, Julien
- Nucleic Acids Research, Vol. 42, Issue D1
GFVO: the Genomic Feature and Variation Ontology
journal, January 2015
- Baran, Joachim; Durgahee, Bibi Sehnaaz Begum; Eilbeck, Karen
- PeerJ, Vol. 3
Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases
journal, January 2007
- Alekseyenko, A. V.; Lee, C. J.
- Bioinformatics, Vol. 23, Issue 11
Spatiotemporal persistence of multiple, diverse clades and toxins of Corynebacterium diphtheriae
journal, March 2021
- Will, Robert C.; Ramamurthy, Thandavarayan; Sharma, Naresh Chand
- Nature Communications, Vol. 12, Issue 1
Bacterial Carbohydrate Structure Database 3: Principles and Realization
journal, December 2010
- Toukach, Philip V.
- Journal of Chemical Information and Modeling, Vol. 51, Issue 1
Targeted editing and evolution of engineered ribosomes in vivo by filtered editing
journal, January 2022
- Radford, Felix; Elliott, Shane D.; Schepartz, Alanna
- Nature Communications, Vol. 13, Issue 1
The terminal peptides of insulin
journal, January 1949
- Sanger, F.
- Biochemical Journal, Vol. 45, Issue 5
The RINGS Resource for Glycome Informatics Analysis and Data Mining on the Web
journal, August 2010
- Akune, Yukie; Hosoda, Masae; Kaiya, Sakiko
- OMICS: A Journal of Integrative Biology, Vol. 14, Issue 4
A Chado case study: an ontology-based modular schema for representing genome-associated biological information
journal, July 2007
- Mungall, Christopher J.; Emmert, David B.
- Bioinformatics, Vol. 23, Issue 13
Biopython: freely available Python tools for computational molecular biology and bioinformatics
journal, March 2009
- Cock, P. J. A.; Antao, T.; Chang, J. T.
- Bioinformatics, Vol. 25, Issue 11
BioRuby: bioinformatics software for the Ruby programming language
journal, August 2010
- Goto, N.; Prins, P.; Nakao, M.
- Bioinformatics, Vol. 26, Issue 20
BioJava: an open-source framework for bioinformatics in 2012
journal, August 2012
- Prlic, A.; Yates, A.; Bliven, S. E.
- Bioinformatics, Vol. 28, Issue 20
GlycomeDB--a unified database for carbohydrate structures
journal, November 2010
- Ranzinger, R.; Herget, S.; von der Lieth, C. -W.
- Nucleic Acids Research, Vol. 39, Issue Database
DDBJ new system and service refactoring
journal, November 2012
- Ogasawara, Osamu; Mashima, Jun; Kodama, Yuichi
- Nucleic Acids Research, Vol. 41, Issue D1
GenBank
journal, November 2012
- Benson, Dennis A.; Cavanaugh, Mark; Clark, Karen
- Nucleic Acids Research, Vol. 41, Issue D1
JBrowse: A next-generation genome browser
journal, July 2009
- Skinner, M. E.; Uzilov, A. V.; Stein, L. D.
- Genome Research, Vol. 19, Issue 9
The Bioperl Toolkit: Perl Modules for the Life Sciences
journal, October 2002
- Stajich, J. E.
- Genome Research, Vol. 12, Issue 10
The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows
journal, January 2010
- Katayama, Toshiaki; Arakawa, Kazuharu; Nakao, Mitsuteru
- Journal of Biomedical Semantics, Vol. 1, Issue 1
The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications
journal, January 2011
- Katayama, Toshiaki; Wilkinson, Mark D.; Vos, Rutger
- Journal of Biomedical Semantics, Vol. 2, Issue 1
BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains
journal, January 2014
- Katayama, Toshiaki; Wilkinson, Mark D.; Aoki-Kinoshita, Kiyoko F.
- Journal of Biomedical Semantics, Vol. 5, Issue 1
A standard variation file format for human genome sequences
journal, January 2010
- Reese, Martin G.; Moore, Barry; Batchelor, Colin
- Genome Biology, Vol. 11, Issue 8
Works referencing / citing this record:
The Empusa code generator and its application to GBOL, an extendable ontology for genome annotation
journal, November 2019
- van Dam, Jesse C. J.; Koehorst, Jasper J.; Vik, Jon Olav
- Scientific Data, Vol. 6, Issue 1
YummyData: providing high-quality open life science data
journal, January 2018
- Yamamoto, Yasunori; Yamaguchi, Atsuko; Splendiani, Andrea
- Database, Vol. 2018
HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes
journal, February 2020
- Bolleman, Jerven; de Castro, Edouard; Baratin, Delphine
- GigaScience, Vol. 9, Issue 2
DNA Data Bank of Japan
journal, October 2016
- Mashima, Jun; Kodama, Yuichi; Fujisawa, Takatomo
- Nucleic Acids Research, Vol. 45, Issue D1
The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species
journal, November 2016
- Mungall, Christopher J.; McMurry, Julie A.; Köhler, Sebastian
- Nucleic Acids Research, Vol. 45, Issue D1
BioHackathon series in 2013 and 2014: improvements of semantic interoperability in life science data and services
journal, January 2019
- Katayama, Toshiaki; Kawashima, Shuichi; Micklem, Gos
- F1000Research, Vol. 8
TogoGenome/TogoStanza: modularized Semantic Web genome database
journal, January 2019
- Katayama, Toshiaki; Kawashima, Shuichi; Okamoto, Shinobu
- Database, Vol. 2019
HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes
journal, February 2020
- Bolleman, Jerven; de Castro, Edouard; Baratin, Delphine
- GigaScience, Vol. 9, Issue 2
BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains
journal, January 2014
- Katayama, Toshiaki; Wilkinson, Mark D.; Aoki-Kinoshita, Kiyoko F.
- Journal of Biomedical Semantics, Vol. 5, Issue 1
Preserving sequence annotations across reference sequences
journal, January 2014
- Tatum, Zuotian; Roos, Marco; Gibson, Andrew P.
- Journal of Biomedical Semantics, Vol. 5, Issue Suppl 1