DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Biocuration: Distilling data into knowledge

Journal Article · · PLoS Biology (Online)

Data, including information generated from them by processing and analysis, are an asset with measurable value. The assets that biological research funding produces are the data generated, the information derived from these data, and, ultimately, the discoveries and knowledge these lead to. From the time when Henry Oldenburg published the first scientific journal in 1665 (Proceedings of the Royal Society) to the founding of the United States National Library of Medicine in 1879 to the present, there has been a sustained drive to improve how researchers can record and discover what is known. Researchers' experimental work builds upon years and (collectively) billions of dollars' worth of earlier work. Today, researchers are generating data at ever-faster rates because of advances in instrumentation and technology, coupled with decreases in production costs. Unfortunately, the ability of researchers to manage and disseminate their results has not kept pace, so their work cannot achieve its maximal impact. Strides have recently been made, but more awareness is needed of the essential role that biological data resources, including biocuration, play in maintaining and linking this ever-growing flood of data and information. The aim of this paper is to describe the nature of data as an asset, the role biocurators play in increasing its value, and consistent, practical means to measure effectiveness that can guide planning and justify costs in biological research information resources' development and management.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
Contributing Organization:
International Society for Biocuration
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1559141
Journal Information:
PLoS Biology (Online), Vol. 16, Issue 4; ISSN 1545-7885
Publisher:
Public Library of ScienceCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 42 works
Citation information provided by
Web of Science

References (90)

Gene Wiki Reviews—Raising the quality and accessibility of information about the human genome journal November 2016
Finding scientific topics journal February 2004
The International Nucleotide Sequence Database Collaboration journal December 2015
Natural Language Processing in aid of FlyBase curators journal January 2008
Sharing Detailed Research Data Is Associated with Increased Citation Rate journal March 2007
On the Future of Genomic Data journal February 2011
ClinVar: public archive of relationships among sequence variation and human phenotype journal November 2013
Community challenges in biomedical text mining over 10 years: success, failure and the future journal May 2015
tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles journal January 2014
Crowdsourcing in biomedicine: challenges and opportunities journal April 2015
Value, but high costs in post-deposition data curation journal January 2016
Overview of the interactive task in BioCreative V journal January 2016
MetaBar - a tool for consistent contextual data acquisition and standards compliant submission journal January 2010
Evaluation of biomedical text-mining systems: Lessons learned from information retrieval journal January 2005
An overview of the BioCreative 2012 Workshop Track III: interactive text mining task journal January 2013
OneDep: Unified wwPDB System for Deposition, Biocuration, and Validation of Macromolecular Structures in the PDB Archive journal March 2017
Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications journal August 2014
Gene Ontology: tool for the unification of biology journal May 2000
Canto: an online tool for community literature curation journal February 2014
Gene name errors are widespread in the scientific literature journal August 2016
SourceData: a semantic platform for curating and searching figures journal November 2017
RightField: embedding ontology annotation in spreadsheets journal May 2011
On the reproducibility of science: unique identification of research resources in the biomedical literature journal January 2013
Achieving human and machine accessibility of cited data in scholarly publications journal January 2015
DataUp: A tool to help researchers describe and share tabular data journal January 2014
Model organism databases: essential resources that need the support of both funders and users journal June 2016
Identification of type 2 diabetes subgroups through topological analysis of patient similarity journal October 2015
The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease journal July 2015
When Data Sharing Gets Close to 100%: What Human Paleogenetics Can Teach the Open Science Movement journal March 2015
Will a Biological Database Be Different from a Biological Journal? journal January 2005
Data reuse and the open data citation advantage journal January 2013
Text mining for the biocuration workflow journal January 2012
The Biocurator: Connecting and Enhancing Scientific Data journal January 2006
The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition journal October 2016
BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences journal January 2016
Interpreting functional effects of coding variants: challenges in proteome-scale prediction, annotation and assessment journal October 2015
The Human Phenotype Ontology in 2017 journal November 2016
The future of biocuration journal September 2008
A MOD(ern) perspective on literature curation journal March 2010
On expert curation and scalability: UniProtKB/Swiss-Prot as a case study journal July 2017
The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data journal November 2013
Web Apollo: a web-based genomic annotation editing platform journal January 2013
Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data journal June 2017
Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency journal November 2015
Navigating the Phenotype Frontier: The Monarch Initiative journal August 2016
Protein interaction data curation: the International Molecular Exchange (IMEx) consortium journal March 2012
Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey journal September 2008
Measuring the value of information: The information-intensive organization journal January 1993
The international nucleotide sequence database collaboration journal November 2017
DataUp: A tool to help researchers describe and share tabular data journal January 2014
Model organism databases: essential resources that need the support of both funders and users text January 2016
Identifiers for the 21st century : How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data. text January 2017
On the Reproducibility of Science: Unique Identification of Research Resources in the Biomedical Literature image January 2014
The International Nucleotide Sequence Database Collaboration journal November 2012
Canto: an online tool for community literature curation. text January 2014
On the Reproducibility of Science: Unique Identification of Research Resources in the Biomedical Literature image January 2014
The International Nucleotide Sequence Database Collaboration journal November 2011
The Human Phenotype Ontology in 2017. text January 2017
Model organism databases: essential resources that need the support of both funders and users text January 2021
Overview of the interactive task in BioCreative V text January 2016
Overview of the Interactive Task in BioCreative V text January 2015
Model organism databases: essential resources that need the support of both funders and users text January 2021
Galaxy Zoo : Morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey text January 2008
A MOD(ern) perspective on literature curation journal March 2010
Gene Wiki Reviews—Raising the quality and accessibility of information about the human genome journal November 2016
Clinical assessment incorporating a personal genome journal May 2010
Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency journal November 2015
SourceData: a semantic platform for curating and searching figures journal November 2017
Finding scientific topics journal February 2004
Evaluation of biomedical text-mining systems: Lessons learned from information retrieval journal January 2005
Interpreting functional effects of coding variants: challenges in proteome-scale prediction, annotation and assessment journal October 2015
RightField: embedding ontology annotation in spreadsheets journal May 2011
An overview of the BioCreative 2012 Workshop Track III: interactive text mining task journal January 2013
tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles journal January 2014
Value, but high costs in post-deposition data curation journal January 2016
Overview of the interactive task in BioCreative V journal January 2016
Crowd-sourcing and author submission as alternatives to professional curation journal January 2016
The International Nucleotide Sequence Database Collaboration journal November 2011
ClinVar: public archive of relationships among sequence variation and human phenotype journal November 2013
The Human Phenotype Ontology in 2017 journal November 2016
On the Future of Genomic Data journal February 2011
Natural Language Processing in aid of FlyBase curators journal January 2008
Web Apollo: a web-based genomic annotation editing platform journal January 2013
Gene name errors are widespread in the scientific literature journal August 2016
DataUp: A tool to help researchers describe and share tabular data journal January 2014
Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data journal June 2017
Biocurators: Contributors to the World of Science journal January 2006
Sharing Detailed Research Data Is Associated with Increased Citation Rate journal March 2007
When Data Sharing Gets Close to 100%: What Human Paleogenetics Can Teach the Open Science Movement journal March 2015
Data reuse and the open data citation advantage journal January 2013

Cited By (6)

A nomenclature and classification for the congenital myasthenic syndromes: preparing for FAIR data in the genomic era journal November 2018
COPO: a metadata platform for brokering FAIR data in the life sciences journal January 2020
Ten quick tips for biocuration journal May 2019
Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems journal January 2018
ProPheno 1.0: An Online Dataset for Accelerating the Complete Characterization of the Human Protein-Phenotype Landscape in Biomedical Literature conference February 2020
Ten quick tips for biocuration journal May 2019