Biocuration: Distilling data into knowledge
Abstract
Data, including information generated from them by processing and analysis, are an asset with measurable value. The assets that biological research funding produces are the data generated, the information derived from these data, and, ultimately, the discoveries and knowledge these lead to. From the time when Henry Oldenburg published the first scientific journal in 1665 (Proceedings of the Royal Society) to the founding of the United States National Library of Medicine in 1879 to the present, there has been a sustained drive to improve how researchers can record and discover what is known. Researchers' experimental work builds upon years and (collectively) billions of dollars' worth of earlier work. Today, researchers are generating data at ever-faster rates because of advances in instrumentation and technology, coupled with decreases in production costs. Unfortunately, the ability of researchers to manage and disseminate their results has not kept pace, so their work cannot achieve its maximal impact. Strides have recently been made, but more awareness is needed of the essential role that biological data resources, including biocuration, play in maintaining and linking this ever-growing flood of data and information. The aim of this paper is to describe the nature of data as an asset, themore »
- Authors:
- Publication Date:
- Research Org.:
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC)
- Contributing Org.:
- International Society for Biocuration
- OSTI Identifier:
- 1559141
- Grant/Contract Number:
- AC02-05CH11231
- Resource Type:
- Accepted Manuscript
- Journal Name:
- PLoS Biology (Online)
- Additional Journal Information:
- Journal Name: PLoS Biology (Online); Journal Volume: 16; Journal Issue: 4; Journal ID: ISSN 1545-7885
- Publisher:
- Public Library of Science
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 96 KNOWLEDGE MANAGEMENT AND PRESERVATION
Citation Formats
International Society for Biocuration. Biocuration: Distilling data into knowledge. United States: N. p., 2018.
Web. doi:10.1371/journal.pbio.2002846.
International Society for Biocuration. Biocuration: Distilling data into knowledge. United States. https://doi.org/10.1371/journal.pbio.2002846
International Society for Biocuration. Mon .
"Biocuration: Distilling data into knowledge". United States. https://doi.org/10.1371/journal.pbio.2002846. https://www.osti.gov/servlets/purl/1559141.
@article{osti_1559141,
title = {Biocuration: Distilling data into knowledge},
author = {International Society for Biocuration},
abstractNote = {Data, including information generated from them by processing and analysis, are an asset with measurable value. The assets that biological research funding produces are the data generated, the information derived from these data, and, ultimately, the discoveries and knowledge these lead to. From the time when Henry Oldenburg published the first scientific journal in 1665 (Proceedings of the Royal Society) to the founding of the United States National Library of Medicine in 1879 to the present, there has been a sustained drive to improve how researchers can record and discover what is known. Researchers' experimental work builds upon years and (collectively) billions of dollars' worth of earlier work. Today, researchers are generating data at ever-faster rates because of advances in instrumentation and technology, coupled with decreases in production costs. Unfortunately, the ability of researchers to manage and disseminate their results has not kept pace, so their work cannot achieve its maximal impact. Strides have recently been made, but more awareness is needed of the essential role that biological data resources, including biocuration, play in maintaining and linking this ever-growing flood of data and information. The aim of this paper is to describe the nature of data as an asset, the role biocurators play in increasing its value, and consistent, practical means to measure effectiveness that can guide planning and justify costs in biological research information resources' development and management.},
doi = {10.1371/journal.pbio.2002846},
journal = {PLoS Biology (Online)},
number = 4,
volume = 16,
place = {United States},
year = {2018},
month = {4}
}
Web of Science
Works referenced in this record:
Gene Wiki Reviews—Raising the quality and accessibility of information about the human genome
journal, November 2016
- Tsueng, Ginger; Good, Benjamin M.; Ping, Peipei
- Gene, Vol. 592, Issue 2
Finding scientific topics
journal, February 2004
- Griffiths, T. L.; Steyvers, M.
- Proceedings of the National Academy of Sciences, Vol. 101, Issue Supplement 1
The International Nucleotide Sequence Database Collaboration
journal, December 2015
- Cochrane, Guy; Karsch-Mizrachi, Ilene; Takagi, Toshihisa
- Nucleic Acids Research, Vol. 44, Issue D1
Natural Language Processing in aid of FlyBase curators
journal, January 2008
- Karamanis, Nikiforos; Seal, Ruth; Lewin, Ian
- BMC Bioinformatics, Vol. 9, Issue 1
Sharing Detailed Research Data Is Associated with Increased Citation Rate
journal, March 2007
- Piwowar, Heather A.; Day, Roger S.; Fridsma, Douglas B.
- PLoS ONE, Vol. 2, Issue 3
ClinVar: public archive of relationships among sequence variation and human phenotype
journal, November 2013
- Landrum, Melissa J.; Lee, Jennifer M.; Riley, George R.
- Nucleic Acids Research, Vol. 42, Issue D1
Community challenges in biomedical text mining over 10 years: success, failure and the future
journal, May 2015
- Huang, Chung-Chi; Lu, Zhiyong
- Briefings in Bioinformatics, Vol. 17, Issue 1
tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles
journal, January 2014
- Cejuela, J. M.; McQuilton, P.; Ponting, L.
- Database, Vol. 2014, Issue 0
Crowdsourcing in biomedicine: challenges and opportunities
journal, April 2015
- Khare, Ritu; Good, Benjamin M.; Leaman, Robert
- Briefings in Bioinformatics, Vol. 17, Issue 1
Value, but high costs in post-deposition data curation
journal, January 2016
- ten Hoopen, Petra; Amid, Clara; Luigi Buttigieg, Pier
- Database, Vol. 2016
Overview of the interactive task in BioCreative V
journal, January 2016
- Wang, Qinghua; S. Abdul, Shabbir; Almeida, Lara
- Database, Vol. 2016
MetaBar - a tool for consistent contextual data acquisition and standards compliant submission
journal, January 2010
- Hankeln, Wolfgang; Buttigieg, Pier Luigi; Fink, Dennis
- BMC Bioinformatics, Vol. 11, Issue 1
Evaluation of biomedical text-mining systems: Lessons learned from information retrieval
journal, January 2005
- Hersh, W.
- Briefings in Bioinformatics, Vol. 6, Issue 4
An overview of the BioCreative 2012 Workshop Track III: interactive text mining task
journal, January 2013
- Arighi, C. N.; Carterette, B.; Cohen, K. B.
- Database, Vol. 2013, Issue 0
OneDep: Unified wwPDB System for Deposition, Biocuration, and Validation of Macromolecular Structures in the PDB Archive
journal, March 2017
- Young, Jasmine Y.; Westbrook, John D.; Feng, Zukang
- Structure, Vol. 25, Issue 3
Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications
journal, August 2014
- Hazen, Benjamin T.; Boone, Christopher A.; Ezell, Jeremy D.
- International Journal of Production Economics, Vol. 154
Gene Ontology: tool for the unification of biology
journal, May 2000
- Ashburner, Michael; Ball, Catherine A.; Blake, Judith A.
- Nature Genetics, Vol. 25, Issue 1
Canto: an online tool for community literature curation
journal, February 2014
- Rutherford, Kim M.; Harris, Midori A.; Lock, Antonia
- Bioinformatics, Vol. 30, Issue 12
Gene name errors are widespread in the scientific literature
journal, August 2016
- Ziemann, Mark; Eren, Yotam; El-Osta, Assam
- Genome Biology, Vol. 17, Issue 1
SourceData: a semantic platform for curating and searching figures
journal, November 2017
- Liechti, Robin; George, Nancy; Götz, Lou
- Nature Methods, Vol. 14, Issue 11
RightField: embedding ontology annotation in spreadsheets
journal, May 2011
- Wolstencroft, K.; Owen, S.; Horridge, M.
- Bioinformatics, Vol. 27, Issue 14
On the reproducibility of science: unique identification of research resources in the biomedical literature
journal, January 2013
- Vasilevsky, Nicole A.; Brush, Matthew H.; Paddock, Holly
- PeerJ, Vol. 1
Achieving human and machine accessibility of cited data in scholarly publications
journal, January 2015
- Starr, Joan; Castro, Eleni; Crosas, Mercè
- PeerJ Computer Science, Vol. 1
DataUp: A tool to help researchers describe and share tabular data
journal, January 2014
- Strasser, Carly; Kunze, John; Abrams, Stephen
- F1000Research, Vol. 3
Model organism databases: essential resources that need the support of both funders and users
journal, June 2016
- Oliver, Stephen G.; Lock, Antonia; Harris, Midori A.
- BMC Biology, Vol. 14, Issue 1
Identification of type 2 diabetes subgroups through topological analysis of patient similarity
journal, October 2015
- Li, Li; Cheng, Wei-Yi; Glicksberg, Benjamin S.
- Science Translational Medicine, Vol. 7, Issue 311
The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease
journal, July 2015
- Groza, Tudor; Köhler, Sebastian; Moldenhauer, Dawid
- The American Journal of Human Genetics, Vol. 97, Issue 1
When Data Sharing Gets Close to 100%: What Human Paleogenetics Can Teach the Open Science Movement
journal, March 2015
- Anagnostou, Paolo; Capocasa, Marco; Milia, Nicola
- PLOS ONE, Vol. 10, Issue 3
Will a Biological Database Be Different from a Biological Journal?
journal, January 2005
- Bourne, Philip
- PLoS Computational Biology, Vol. 1, Issue 3
Data reuse and the open data citation advantage
journal, January 2013
- Piwowar, Heather A.; Vision, Todd J.
- PeerJ, Vol. 1
Text mining for the biocuration workflow
journal, January 2012
- Hirschman, L.; Burns, G. A. P. C.; Krallinger, M.
- Database, Vol. 2012, Issue 0
The Biocurator: Connecting and Enhancing Scientific Data
journal, January 2006
- Salimi, Nima; Vita, Randi
- PLoS Computational Biology, Vol. 2, Issue 10
The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition
journal, October 2016
- Deutsch, Eric W.; Csordas, Attila; Sun, Zhi
- Nucleic Acids Research, Vol. 45, Issue D1
BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences
journal, January 2016
- McQuilton, Peter; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe
- Database, Vol. 2016
Interpreting functional effects of coding variants: challenges in proteome-scale prediction, annotation and assessment
journal, October 2015
- Shameer, Khader; Tripathi, Lokesh P.; Kalari, Krishna R.
- Briefings in Bioinformatics, Vol. 17, Issue 5
The Human Phenotype Ontology in 2017
journal, November 2016
- Köhler, Sebastian; Vasilevsky, Nicole A.; Engelstad, Mark
- Nucleic Acids Research, Vol. 45, Issue D1
The future of biocuration
journal, September 2008
- Howe, Doug; Costanzo, Maria; Fey, Petra
- Nature, Vol. 455, Issue 7209
A MOD(ern) perspective on literature curation
journal, March 2010
- Hirschman, Jodi; Berardini, Tanya Z.; Drabkin, Harold J.
- Molecular Genetics and Genomics, Vol. 283, Issue 5
On expert curation and scalability: UniProtKB/Swiss-Prot as a case study
journal, July 2017
- Poux, Sylvain; Arighi, Cecilia N.; Magrane, Michele
- Bioinformatics, Vol. 33, Issue 21
The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data
journal, November 2013
- Köhler, Sebastian; Doelken, Sandra C.; Mungall, Christopher J.
- Nucleic Acids Research, Vol. 42, Issue D1
Web Apollo: a web-based genomic annotation editing platform
journal, January 2013
- Lee, Eduardo; Helt, Gregg A.; Reese, Justin T.
- Genome Biology, Vol. 14, Issue 8
Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data
journal, June 2017
- McMurry, Julie A.; Juty, Nick; Blomberg, Niklas
- PLOS Biology, Vol. 15, Issue 6
Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency
journal, November 2015
- Bone, William P.; Washington, Nicole L.; Buske, Orion J.
- Genetics in Medicine, Vol. 18, Issue 6
Clinical assessment incorporating a personal genome
journal, May 2010
- Ashley, Euan A.; Butte, Atul J.; Wheeler, Matthew T.
- The Lancet, Vol. 375, Issue 9725
Navigating the Phenotype Frontier: The Monarch Initiative
journal, August 2016
- McMurry, Julie A.; Köhler, Sebastian; Washington, Nicole L.
- Genetics, Vol. 203, Issue 4
Protein interaction data curation: the International Molecular Exchange (IMEx) consortium
journal, March 2012
- Orchard, Sandra; Kerrien, Samuel; Abbani, Sara
- Nature Methods, Vol. 9, Issue 4
Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey ★
journal, September 2008
- Lintott, Chris J.; Schawinski, Kevin; Slosar, Anže
- Monthly Notices of the Royal Astronomical Society, Vol. 389, Issue 3
Measuring the value of information: The information-intensive organization
journal, January 1993
- Glazer, R.
- IBM Systems Journal, Vol. 32, Issue 1
The international nucleotide sequence database collaboration
journal, November 2017
- Karsch-Mizrachi, Ilene; Takagi, Toshihisa; Cochrane, Guy
- Nucleic Acids Research, Vol. 46, Issue D1
DataUp: A tool to help researchers describe and share tabular data
journal, January 2014
- Strasser, Carly; Kunze, John; Abrams, Stephen
- F1000Research, Vol. 3
Model organism databases: essential resources that need the support of both funders and users
text, January 2016
- Oliver, Stephen; Lock, Antonia; Harris, Midori
- Apollo - University of Cambridge Repository
Identifiers for the 21st century : How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data.
text, January 2017
- McMurry, Julie A.; Juty, Nick; Blomberg, Niklas
- Public Library of Science (PLoS)
On the Reproducibility of Science: Unique Identification of Research Resources in the Biomedical Literature
image, January 2014
- Vasilevsky, Nicole; Brush, Matthew; Paddock, Holly
- figshare
The International Nucleotide Sequence Database Collaboration
journal, November 2012
- Nakamura, Y.; Cochrane, G.; Karsch-Mizrachi, I.
- Nucleic Acids Research, Vol. 41, Issue D1
Canto: an online tool for community literature curation.
text, January 2014
- Rutherford, Kim M.; Harris, Midori; Lock, Antonia
- Apollo - University of Cambridge Repository
On the Reproducibility of Science: Unique Identification of Research Resources in the Biomedical Literature
image, January 2014
- Vasilevsky, Nicole; Brush, Matthew; Paddock, Holly
- figshare
The International Nucleotide Sequence Database Collaboration
journal, November 2011
- Karsch-Mizrachi, I.; Nakamura, Y.; Cochrane, G.
- Nucleic Acids Research, Vol. 40, Issue D1
The Human Phenotype Ontology in 2017.
text, January 2017
- Köhler, Sebastian; Vasilevsky, Nicole A.; Engelstad, Mark
- Apollo - University of Cambridge Repository
Model organism databases: essential resources that need the support of both funders and users
text, January 2021
- Oliver, Stephen G.; Lock, Antonia; Harris, Midori A.
- The Francis Crick Institute
Overview of the interactive task in BioCreative V
text, January 2016
- Wang, Qinghua; Rinaldi, Fabio; Jimenez, Silvia
- Oxford University Press
Overview of the Interactive Task in BioCreative V
text, January 2015
- Wang, Qinghua; Rinaldi, Fabio; Al, Et
- BioCreative
Model organism databases: essential resources that need the support of both funders and users
text, January 2021
- Oliver, Stephen G.; Lock, Antonia; Harris, Midori A.
- The Francis Crick Institute
Galaxy Zoo : Morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey
text, January 2008
- Lintott, Chris J.; Schawinski, Kevin; Slosar, Anze
- arXiv
Works referencing / citing this record:
A nomenclature and classification for the congenital myasthenic syndromes: preparing for FAIR data in the genomic era
journal, November 2018
- Thompson, Rachel; Abicht, Angela; Beeson, David
- Orphanet Journal of Rare Diseases, Vol. 13, Issue 1
COPO: a metadata platform for brokering FAIR data in the life sciences
journal, January 2020
- Shaw, Felix; Etuk, Anthony; Minotto, Alice
- F1000Research, Vol. 9
Ten quick tips for biocuration
journal, May 2019
- Tang, Y. Amy; Pichler, Klemens; Füllgrabe, Anja
- PLOS Computational Biology, Vol. 15, Issue 5
Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems
journal, January 2018
- Dahdul, Wasila; Manda, Prashanti; Cui, Hong
- Database, Vol. 2018