skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Crowdsourcing and curation: perspectives from biology and natural language processing

Journal Article · · Database
 [1];  [2];  [3];  [4];  [5];  [6]
  1. The MITRE Corporation, Bedford, MA (United States)
  2. Univ. of Paris-Sorbonne, Paris (France). STIH Team
  3. Philip Morris Products S.A., Neuchatel (Switzerland). Philip Morris International R&D
  4. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
  5. National Inst. of Health (NIH), Bethesda, MD (United States). National Library of Medicine. National Center for Biotechnology Information
  6. Univ. of Colorado, Denver, CO (United States). School of Medicine

Crowdsourcing is increasingly utilized for performing tasks in both natural language processing and biocuration. Although there have been many applications of crowdsourcing in these fields, there have been fewer high-level discussions of the methodology and its applicability to biocuration. This paper explores crowdsourcing for biocuration through several case studies that highlight different ways of leveraging ‘the crowd’; these raise issues about the kind(s) of expertise needed, the motivations of participants, and questions related to feasibility, cost and quality. The paper is an outgrowth of a panel session held at BioCreative V (Seville, September 9–11, 2015). The session consisted of four short talks, followed by a discussion. In their talks, the panelists explored the role of expertise and the potential to improve crowd performance by training; the challenge of decomposing tasks to make them amenable to crowdsourcing; and the capture of biological data and metadata through community editing.

Research Organization:
USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); The MITRE Corporation, Bedford, MA (United States); National Institutes of Health (NIH), Bethesda, MD (United States); Univ. of Paris-Sorbonne, Paris (France)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER); National Inst. of Health (NIH) (United States); National Science Foundation (NSF); Institute for Research in Computer Science and Automation (INRIA) (France); Ministry of Culture (France); Philip Morris International (United States)
Contributing Organization:
Philip Morris Products S.A., Neuchatel (Switzerland); Univ. of Colorado, Denver, CO (United States)
Grant/Contract Number:
SC0010838; R13-GM109648-01A1; 2R01 LM008111-09A1; LM009254-09; 1R01MH096906-01A1; IIS-1207592
OSTI ID:
1360095
Journal Information:
Database, Vol. 2016; ISSN 1758-0463
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 7 works
Citation information provided by
Web of Science

References (33)

Crowdsourcing in biomedicine: challenges and opportunities journal April 2015
Labeling images with a computer game conference January 2004
Turk-Life in India conference January 2014
Algorithm discovery by protein folding game players journal November 2011
Creating Zombilingo , a game with a purpose for dependency syntax annotation
  • Fort, Karën; Guillaume, Bruno; Chastant, Hadrien
  • Proceedings of the First International Workshop on Gamification for Information Retrieval - GamifIR '14 https://doi.org/10.1145/2594776.2594777
conference January 2014
Amazon Mechanical Turk: Gold Mine or Coal Mine? journal June 2011
Hybrid curation of gene–mutation relations combining automated extraction and crowdsourcing journal January 2014
Scaling drug indication curation through crowdsourcing journal January 2015
A crowdsourcing workflow for extracting chemical-induced disease relations from free text journal January 2016
Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing journal January 2013
Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD) journal October 2009
A CTD-Pfizer collaboration: manual curation of 88 000 scientific articles text mined for drug-disease and drug-phenotype interactions journal January 2013
Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge journal August 2013
A crowd-sourcing approach for the construction of species-specific cell signaling networks journal October 2014
Industrial methodology for process verification in research (IMPROVER): toward systems biology verification journal March 2012
Enhancement of COPD biological networks using a web-based collaboration interface journal January 2015
KEGG as a reference resource for gene and protein annotation journal October 2015
WikiPathways: capturing the full diversity of pathway knowledge journal October 2015
Construction of biological networks from unstructured information based on a semi-automated curation workflow journal January 2015
The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification journal October 2014
EXTRACT: interactive extraction of environment metadata and term suggestion for metagenomic sample annotation journal January 2016
ENVIRONMENTS and EOL: identification of Environment Ontology terms in text and the annotation of the Encyclopedia of Life journal January 2015
A comparison of group and individual performance among subject experts and untrained workers at the document retrieval task journal January 1998
Prize-based contests can provide solutions to computational biology problems journal February 2013
Molecular biology gets wikified journal July 2008
Crowdsourcing genetic prediction of clinical utility in the Rheumatoid Arthritis Responder Challenge journal April 2013
Crowdsourcing and Mining Crowd data conference November 2014
An evaluation of GO annotation retrieval for BioCreAtIvE and GOA journal January 2005
Crowdsourcing genomic analyses of ash and ash dieback – power to the people journal February 2013
Enhancement of COPD biological networks using a web-based collaboration interface journal January 2015
Crowdsourcing taste research: genetic and phenotypic predictors of bitter taste perception as a model journal May 2014
Health 2050: The Realization of Personalized Medicine through Crowdsourcing, the Quantified Self, and the Participatory Biocitizen journal September 2012
Strengths and limitations of microarray-based phenotype prediction: Lessons learned from the IMPROVER Diagnostic Signature Challenge text January 2013

Cited By (3)

Making the right calls in precision oncology journal August 2018
Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm journal June 2019
Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements journal October 2016

Similar Records

Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges
Journal Article · Mon Dec 26 00:00:00 EST 2016 · Database · OSTI ID:1360095

Overview of the interactive task in BioCreative V
Journal Article · Thu Sep 01 00:00:00 EDT 2016 · Database · OSTI ID:1360095

BioCreative Workshops for DOE Genome Sciences: Text Mining for Metagenomics
Technical Report · Sat Oct 29 00:00:00 EDT 2016 · OSTI ID:1360095