DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Biocuration: Distilling data into knowledge

Abstract

Data, including information generated from them by processing and analysis, are an asset with measurable value. The assets that biological research funding produces are the data generated, the information derived from these data, and, ultimately, the discoveries and knowledge these lead to. From the time when Henry Oldenburg published the first scientific journal in 1665 (Proceedings of the Royal Society) to the founding of the United States National Library of Medicine in 1879 to the present, there has been a sustained drive to improve how researchers can record and discover what is known. Researchers' experimental work builds upon years and (collectively) billions of dollars' worth of earlier work. Today, researchers are generating data at ever-faster rates because of advances in instrumentation and technology, coupled with decreases in production costs. Unfortunately, the ability of researchers to manage and disseminate their results has not kept pace, so their work cannot achieve its maximal impact. Strides have recently been made, but more awareness is needed of the essential role that biological data resources, including biocuration, play in maintaining and linking this ever-growing flood of data and information. The aim of this paper is to describe the nature of data as an asset, themore » role biocurators play in increasing its value, and consistent, practical means to measure effectiveness that can guide planning and justify costs in biological research information resources' development and management.« less

Authors:
Publication Date:
Research Org.:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
Contributing Org.:
International Society for Biocuration
OSTI Identifier:
1559141
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
PLoS Biology (Online)
Additional Journal Information:
Journal Name: PLoS Biology (Online); Journal Volume: 16; Journal Issue: 4; Journal ID: ISSN 1545-7885
Publisher:
Public Library of Science
Country of Publication:
United States
Language:
English
Subject:
96 KNOWLEDGE MANAGEMENT AND PRESERVATION

Citation Formats

International Society for Biocuration. Biocuration: Distilling data into knowledge. United States: N. p., 2018. Web. doi:10.1371/journal.pbio.2002846.
International Society for Biocuration. Biocuration: Distilling data into knowledge. United States. https://doi.org/10.1371/journal.pbio.2002846
International Society for Biocuration. Mon . "Biocuration: Distilling data into knowledge". United States. https://doi.org/10.1371/journal.pbio.2002846. https://www.osti.gov/servlets/purl/1559141.
@article{osti_1559141,
title = {Biocuration: Distilling data into knowledge},
author = {International Society for Biocuration},
abstractNote = {Data, including information generated from them by processing and analysis, are an asset with measurable value. The assets that biological research funding produces are the data generated, the information derived from these data, and, ultimately, the discoveries and knowledge these lead to. From the time when Henry Oldenburg published the first scientific journal in 1665 (Proceedings of the Royal Society) to the founding of the United States National Library of Medicine in 1879 to the present, there has been a sustained drive to improve how researchers can record and discover what is known. Researchers' experimental work builds upon years and (collectively) billions of dollars' worth of earlier work. Today, researchers are generating data at ever-faster rates because of advances in instrumentation and technology, coupled with decreases in production costs. Unfortunately, the ability of researchers to manage and disseminate their results has not kept pace, so their work cannot achieve its maximal impact. Strides have recently been made, but more awareness is needed of the essential role that biological data resources, including biocuration, play in maintaining and linking this ever-growing flood of data and information. The aim of this paper is to describe the nature of data as an asset, the role biocurators play in increasing its value, and consistent, practical means to measure effectiveness that can guide planning and justify costs in biological research information resources' development and management.},
doi = {10.1371/journal.pbio.2002846},
journal = {PLoS Biology (Online)},
number = 4,
volume = 16,
place = {United States},
year = {Mon Apr 16 00:00:00 EDT 2018},
month = {Mon Apr 16 00:00:00 EDT 2018}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 37 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Gene Wiki Reviews—Raising the quality and accessibility of information about the human genome
journal, November 2016


Finding scientific topics
journal, February 2004

  • Griffiths, T. L.; Steyvers, M.
  • Proceedings of the National Academy of Sciences, Vol. 101, Issue Supplement 1
  • DOI: 10.1073/pnas.0307752101

The International Nucleotide Sequence Database Collaboration
journal, December 2015

  • Cochrane, Guy; Karsch-Mizrachi, Ilene; Takagi, Toshihisa
  • Nucleic Acids Research, Vol. 44, Issue D1
  • DOI: 10.1093/nar/gkv1323

Natural Language Processing in aid of FlyBase curators
journal, January 2008


Sharing Detailed Research Data Is Associated with Increased Citation Rate
journal, March 2007


On the Future of Genomic Data
journal, February 2011


ClinVar: public archive of relationships among sequence variation and human phenotype
journal, November 2013

  • Landrum, Melissa J.; Lee, Jennifer M.; Riley, George R.
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1113

Community challenges in biomedical text mining over 10 years: success, failure and the future
journal, May 2015

  • Huang, Chung-Chi; Lu, Zhiyong
  • Briefings in Bioinformatics, Vol. 17, Issue 1
  • DOI: 10.1093/bib/bbv024

tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles
journal, January 2014


Crowdsourcing in biomedicine: challenges and opportunities
journal, April 2015

  • Khare, Ritu; Good, Benjamin M.; Leaman, Robert
  • Briefings in Bioinformatics, Vol. 17, Issue 1
  • DOI: 10.1093/bib/bbv021

Value, but high costs in post-deposition data curation
journal, January 2016


Overview of the interactive task in BioCreative V
journal, January 2016


MetaBar - a tool for consistent contextual data acquisition and standards compliant submission
journal, January 2010

  • Hankeln, Wolfgang; Buttigieg, Pier Luigi; Fink, Dennis
  • BMC Bioinformatics, Vol. 11, Issue 1
  • DOI: 10.1186/1471-2105-11-358

Evaluation of biomedical text-mining systems: Lessons learned from information retrieval
journal, January 2005


An overview of the BioCreative 2012 Workshop Track III: interactive text mining task
journal, January 2013


OneDep: Unified wwPDB System for Deposition, Biocuration, and Validation of Macromolecular Structures in the PDB Archive
journal, March 2017


Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications
journal, August 2014

  • Hazen, Benjamin T.; Boone, Christopher A.; Ezell, Jeremy D.
  • International Journal of Production Economics, Vol. 154
  • DOI: 10.1016/j.ijpe.2014.04.018

Gene Ontology: tool for the unification of biology
journal, May 2000

  • Ashburner, Michael; Ball, Catherine A.; Blake, Judith A.
  • Nature Genetics, Vol. 25, Issue 1
  • DOI: 10.1038/75556

Canto: an online tool for community literature curation
journal, February 2014


Gene name errors are widespread in the scientific literature
journal, August 2016


SourceData: a semantic platform for curating and searching figures
journal, November 2017

  • Liechti, Robin; George, Nancy; Götz, Lou
  • Nature Methods, Vol. 14, Issue 11
  • DOI: 10.1038/nmeth.4471

RightField: embedding ontology annotation in spreadsheets
journal, May 2011


On the reproducibility of science: unique identification of research resources in the biomedical literature
journal, January 2013

  • Vasilevsky, Nicole A.; Brush, Matthew H.; Paddock, Holly
  • PeerJ, Vol. 1
  • DOI: 10.7717/peerj.148

Achieving human and machine accessibility of cited data in scholarly publications
journal, January 2015

  • Starr, Joan; Castro, Eleni; Crosas, Mercè
  • PeerJ Computer Science, Vol. 1
  • DOI: 10.7717/peerj-cs.1

DataUp: A tool to help researchers describe and share tabular data
journal, January 2014


Model organism databases: essential resources that need the support of both funders and users
journal, June 2016


Identification of type 2 diabetes subgroups through topological analysis of patient similarity
journal, October 2015


The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease
journal, July 2015

  • Groza, Tudor; Köhler, Sebastian; Moldenhauer, Dawid
  • The American Journal of Human Genetics, Vol. 97, Issue 1
  • DOI: 10.1016/j.ajhg.2015.05.020

When Data Sharing Gets Close to 100%: What Human Paleogenetics Can Teach the Open Science Movement
journal, March 2015


Will a Biological Database Be Different from a Biological Journal?
journal, January 2005


Data reuse and the open data citation advantage
journal, January 2013


Text mining for the biocuration workflow
journal, January 2012


The Biocurator: Connecting and Enhancing Scientific Data
journal, January 2006


The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition
journal, October 2016

  • Deutsch, Eric W.; Csordas, Attila; Sun, Zhi
  • Nucleic Acids Research, Vol. 45, Issue D1
  • DOI: 10.1093/nar/gkw936

BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences
journal, January 2016

  • McQuilton, Peter; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe
  • Database, Vol. 2016
  • DOI: 10.1093/database/baw075

Interpreting functional effects of coding variants: challenges in proteome-scale prediction, annotation and assessment
journal, October 2015

  • Shameer, Khader; Tripathi, Lokesh P.; Kalari, Krishna R.
  • Briefings in Bioinformatics, Vol. 17, Issue 5
  • DOI: 10.1093/bib/bbv084

The Human Phenotype Ontology in 2017
journal, November 2016

  • Köhler, Sebastian; Vasilevsky, Nicole A.; Engelstad, Mark
  • Nucleic Acids Research, Vol. 45, Issue D1
  • DOI: 10.1093/nar/gkw1039

The future of biocuration
journal, September 2008

  • Howe, Doug; Costanzo, Maria; Fey, Petra
  • Nature, Vol. 455, Issue 7209
  • DOI: 10.1038/455047a

A MOD(ern) perspective on literature curation
journal, March 2010

  • Hirschman, Jodi; Berardini, Tanya Z.; Drabkin, Harold J.
  • Molecular Genetics and Genomics, Vol. 283, Issue 5
  • DOI: 10.1007/s00438-010-0525-8

On expert curation and scalability: UniProtKB/Swiss-Prot as a case study
journal, July 2017


The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data
journal, November 2013

  • Köhler, Sebastian; Doelken, Sandra C.; Mungall, Christopher J.
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1026

Web Apollo: a web-based genomic annotation editing platform
journal, January 2013


Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency
journal, November 2015

  • Bone, William P.; Washington, Nicole L.; Buske, Orion J.
  • Genetics in Medicine, Vol. 18, Issue 6
  • DOI: 10.1038/gim.2015.137

Navigating the Phenotype Frontier: The Monarch Initiative
journal, August 2016


Protein interaction data curation: the International Molecular Exchange (IMEx) consortium
journal, March 2012

  • Orchard, Sandra; Kerrien, Samuel; Abbani, Sara
  • Nature Methods, Vol. 9, Issue 4
  • DOI: 10.1038/nmeth.1931

Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey
journal, September 2008

  • Lintott, Chris J.; Schawinski, Kevin; Slosar, Anže
  • Monthly Notices of the Royal Astronomical Society, Vol. 389, Issue 3
  • DOI: 10.1111/j.1365-2966.2008.13689.x

Measuring the value of information: The information-intensive organization
journal, January 1993


The international nucleotide sequence database collaboration
journal, November 2017

  • Karsch-Mizrachi, Ilene; Takagi, Toshihisa; Cochrane, Guy
  • Nucleic Acids Research, Vol. 46, Issue D1
  • DOI: 10.1093/nar/gkx1097

DataUp: A tool to help researchers describe and share tabular data
journal, January 2014


Model organism databases: essential resources that need the support of both funders and users
text, January 2016

  • Oliver, Stephen; Lock, Antonia; Harris, Midori
  • Apollo - University of Cambridge Repository
  • DOI: 10.17863/cam.4766

On the Reproducibility of Science: Unique Identification of Research Resources in the Biomedical Literature
image, January 2014


The International Nucleotide Sequence Database Collaboration
journal, November 2012

  • Nakamura, Y.; Cochrane, G.; Karsch-Mizrachi, I.
  • Nucleic Acids Research, Vol. 41, Issue D1
  • DOI: 10.1093/nar/gks1084

Canto: an online tool for community literature curation.
text, January 2014

  • Rutherford, Kim M.; Harris, Midori; Lock, Antonia
  • Apollo - University of Cambridge Repository
  • DOI: 10.17863/cam.52154

The International Nucleotide Sequence Database Collaboration
journal, November 2011

  • Karsch-Mizrachi, I.; Nakamura, Y.; Cochrane, G.
  • Nucleic Acids Research, Vol. 40, Issue D1
  • DOI: 10.1093/nar/gkr1006

The Human Phenotype Ontology in 2017.
text, January 2017

  • Köhler, Sebastian; Vasilevsky, Nicole A.; Engelstad, Mark
  • Apollo - University of Cambridge Repository
  • DOI: 10.17863/cam.23701

Model organism databases: essential resources that need the support of both funders and users
text, January 2021


Overview of the interactive task in BioCreative V
text, January 2016

  • Wang, Qinghua; Rinaldi, Fabio; Jimenez, Silvia
  • Oxford University Press
  • DOI: 10.5167/uzh-129485

Overview of the Interactive Task in BioCreative V
text, January 2015


Model organism databases: essential resources that need the support of both funders and users
text, January 2021


Galaxy Zoo : Morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey
text, January 2008


A MOD(ern) perspective on literature curation
journal, March 2010

  • Hirschman, Jodi; Berardini, Tanya Z.; Drabkin, Harold J.
  • Molecular Genetics and Genomics, Vol. 283, Issue 5
  • DOI: 10.1007/s00438-010-0525-8

Gene Wiki Reviews—Raising the quality and accessibility of information about the human genome
journal, November 2016


Clinical assessment incorporating a personal genome
journal, May 2010


Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency
journal, November 2015

  • Bone, William P.; Washington, Nicole L.; Buske, Orion J.
  • Genetics in Medicine, Vol. 18, Issue 6
  • DOI: 10.1038/gim.2015.137

SourceData: a semantic platform for curating and searching figures
journal, November 2017

  • Liechti, Robin; George, Nancy; Götz, Lou
  • Nature Methods, Vol. 14, Issue 11
  • DOI: 10.1038/nmeth.4471

Finding scientific topics
journal, February 2004

  • Griffiths, T. L.; Steyvers, M.
  • Proceedings of the National Academy of Sciences, Vol. 101, Issue Supplement 1
  • DOI: 10.1073/pnas.0307752101

Evaluation of biomedical text-mining systems: Lessons learned from information retrieval
journal, January 2005


Interpreting functional effects of coding variants: challenges in proteome-scale prediction, annotation and assessment
journal, October 2015

  • Shameer, Khader; Tripathi, Lokesh P.; Kalari, Krishna R.
  • Briefings in Bioinformatics, Vol. 17, Issue 5
  • DOI: 10.1093/bib/bbv084

RightField: embedding ontology annotation in spreadsheets
journal, May 2011


An overview of the BioCreative 2012 Workshop Track III: interactive text mining task
journal, January 2013


tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles
journal, January 2014


Value, but high costs in post-deposition data curation
journal, January 2016


Overview of the interactive task in BioCreative V
journal, January 2016


The International Nucleotide Sequence Database Collaboration
journal, November 2011

  • Karsch-Mizrachi, I.; Nakamura, Y.; Cochrane, G.
  • Nucleic Acids Research, Vol. 40, Issue D1
  • DOI: 10.1093/nar/gkr1006

ClinVar: public archive of relationships among sequence variation and human phenotype
journal, November 2013

  • Landrum, Melissa J.; Lee, Jennifer M.; Riley, George R.
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1113

The Human Phenotype Ontology in 2017
journal, November 2016

  • Köhler, Sebastian; Vasilevsky, Nicole A.; Engelstad, Mark
  • Nucleic Acids Research, Vol. 45, Issue D1
  • DOI: 10.1093/nar/gkw1039

On the Future of Genomic Data
journal, February 2011


Natural Language Processing in aid of FlyBase curators
journal, January 2008


Web Apollo: a web-based genomic annotation editing platform
journal, January 2013


Gene name errors are widespread in the scientific literature
journal, August 2016


DataUp: A tool to help researchers describe and share tabular data
journal, January 2014


Biocurators: Contributors to the World of Science
journal, January 2006


Sharing Detailed Research Data Is Associated with Increased Citation Rate
journal, March 2007


When Data Sharing Gets Close to 100%: What Human Paleogenetics Can Teach the Open Science Movement
journal, March 2015


Data reuse and the open data citation advantage
journal, January 2013


Works referencing / citing this record:

A nomenclature and classification for the congenital myasthenic syndromes: preparing for FAIR data in the genomic era
journal, November 2018

  • Thompson, Rachel; Abicht, Angela; Beeson, David
  • Orphanet Journal of Rare Diseases, Vol. 13, Issue 1
  • DOI: 10.1186/s13023-018-0955-7

COPO: a metadata platform for brokering FAIR data in the life sciences
journal, January 2020


Ten quick tips for biocuration
journal, May 2019


ProPheno 1.0: An Online Dataset for Accelerating the Complete Characterization of the Human Protein-Phenotype Landscape in Biomedical Literature
conference, February 2020

  • Pourreza Shahri, Morteza; Kahanda, Indika
  • 2020 IEEE 14th International Conference on Semantic Computing (ICSC)
  • DOI: 10.1109/icsc.2020.00081

Ten quick tips for biocuration
journal, May 2019