DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Crowdsourcing and curation: perspectives from biology and natural language processing

Abstract

Crowdsourcing is increasingly utilized for performing tasks in both natural language processing and biocuration. Although there have been many applications of crowdsourcing in these fields, there have been fewer high-level discussions of the methodology and its applicability to biocuration. This paper explores crowdsourcing for biocuration through several case studies that highlight different ways of leveraging ‘the crowd’; these raise issues about the kind(s) of expertise needed, the motivations of participants, and questions related to feasibility, cost and quality. The paper is an outgrowth of a panel session held at BioCreative V (Seville, September 9–11, 2015). The session consisted of four short talks, followed by a discussion. In their talks, the panelists explored the role of expertise and the potential to improve crowd performance by training; the challenge of decomposing tasks to make them amenable to crowdsourcing; and the capture of biological data and metadata through community editing.

Authors:
 [1];  [2];  [3];  [4];  [5];  [6]
  1. The MITRE Corporation, Bedford, MA (United States)
  2. Univ. of Paris-Sorbonne, Paris (France). STIH Team
  3. Philip Morris Products S.A., Neuchatel (Switzerland). Philip Morris International R&D
  4. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
  5. National Inst. of Health (NIH), Bethesda, MD (United States). National Library of Medicine. National Center for Biotechnology Information
  6. Univ. of Colorado, Denver, CO (United States). School of Medicine
Publication Date:
Research Org.:
USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); The MITRE Corporation, Bedford, MA (United States); National Institutes of Health (NIH), Bethesda, MD (United States); Univ. of Paris-Sorbonne, Paris (France)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER); National Inst. of Health (NIH) (United States); National Science Foundation (NSF); Institute for Research in Computer Science and Automation (INRIA) (France); Ministry of Culture (France); Philip Morris International (United States)
Contributing Org.:
Philip Morris Products S.A., Neuchatel (Switzerland); Univ. of Colorado, Denver, CO (United States)
OSTI Identifier:
1360095
Grant/Contract Number:  
SC0010838; R13-GM109648-01A1; 2R01 LM008111-09A1; LM009254-09; 1R01MH096906-01A1; IIS-1207592
Resource Type:
Accepted Manuscript
Journal Name:
Database
Additional Journal Information:
Journal Volume: 2016; Journal ID: ISSN 1758-0463
Publisher:
Oxford University Press
Country of Publication:
United States
Language:
English
Subject:
96 KNOWLEDGE MANAGEMENT AND PRESERVATION

Citation Formats

Hirschman, Lynette, Fort, Karën, Boué, Stéphanie, Kyrpides, Nikos, Islamaj Doğan, Rezarta, and Cohen, Kevin Bretonnel. Crowdsourcing and curation: perspectives from biology and natural language processing. United States: N. p., 2016. Web. doi:10.1093/database/baw115.
Hirschman, Lynette, Fort, Karën, Boué, Stéphanie, Kyrpides, Nikos, Islamaj Doğan, Rezarta, & Cohen, Kevin Bretonnel. Crowdsourcing and curation: perspectives from biology and natural language processing. United States. https://doi.org/10.1093/database/baw115
Hirschman, Lynette, Fort, Karën, Boué, Stéphanie, Kyrpides, Nikos, Islamaj Doğan, Rezarta, and Cohen, Kevin Bretonnel. Mon . "Crowdsourcing and curation: perspectives from biology and natural language processing". United States. https://doi.org/10.1093/database/baw115. https://www.osti.gov/servlets/purl/1360095.
@article{osti_1360095,
title = {Crowdsourcing and curation: perspectives from biology and natural language processing},
author = {Hirschman, Lynette and Fort, Karën and Boué, Stéphanie and Kyrpides, Nikos and Islamaj Doğan, Rezarta and Cohen, Kevin Bretonnel},
abstractNote = {Crowdsourcing is increasingly utilized for performing tasks in both natural language processing and biocuration. Although there have been many applications of crowdsourcing in these fields, there have been fewer high-level discussions of the methodology and its applicability to biocuration. This paper explores crowdsourcing for biocuration through several case studies that highlight different ways of leveraging ‘the crowd’; these raise issues about the kind(s) of expertise needed, the motivations of participants, and questions related to feasibility, cost and quality. The paper is an outgrowth of a panel session held at BioCreative V (Seville, September 9–11, 2015). The session consisted of four short talks, followed by a discussion. In their talks, the panelists explored the role of expertise and the potential to improve crowd performance by training; the challenge of decomposing tasks to make them amenable to crowdsourcing; and the capture of biological data and metadata through community editing.},
doi = {10.1093/database/baw115},
journal = {Database},
number = ,
volume = 2016,
place = {United States},
year = {Mon Aug 08 00:00:00 EDT 2016},
month = {Mon Aug 08 00:00:00 EDT 2016}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 7 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Crowdsourcing in biomedicine: challenges and opportunities
journal, April 2015

  • Khare, Ritu; Good, Benjamin M.; Leaman, Robert
  • Briefings in Bioinformatics, Vol. 17, Issue 1
  • DOI: 10.1093/bib/bbv021

Labeling images with a computer game
conference, January 2004

  • von Ahn, Luis; Dabbish, Laura
  • Proceedings of the 2004 conference on Human factors in computing systems - CHI '04
  • DOI: 10.1145/985692.985733

Turk-Life in India
conference, January 2014

  • Gupta, Neha; Martin, David; Hanrahan, Benjamin V.
  • Proceedings of the 18th International Conference on Supporting Group Work - GROUP '14
  • DOI: 10.1145/2660398.2660403

Algorithm discovery by protein folding game players
journal, November 2011

  • Khatib, F.; Cooper, S.; Tyka, M. D.
  • Proceedings of the National Academy of Sciences, Vol. 108, Issue 47
  • DOI: 10.1073/pnas.1115898108

Creating Zombilingo , a game with a purpose for dependency syntax annotation
conference, January 2014

  • Fort, Karën; Guillaume, Bruno; Chastant, Hadrien
  • Proceedings of the First International Workshop on Gamification for Information Retrieval - GamifIR '14
  • DOI: 10.1145/2594776.2594777

Amazon Mechanical Turk: Gold Mine or Coal Mine?
journal, June 2011

  • Fort, Karën; Adda, Gilles; Cohen, K. Bretonnel
  • Computational Linguistics, Vol. 37, Issue 2
  • DOI: 10.1162/COLI_a_00057

Hybrid curation of gene–mutation relations combining automated extraction and crowdsourcing
journal, January 2014


Scaling drug indication curation through crowdsourcing
journal, January 2015


A crowdsourcing workflow for extracting chemical-induced disease relations from free text
journal, January 2016


Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing
journal, January 2013

  • Zhai, Haijun; Lingren, Todd; Deleger, Louise
  • Journal of Medical Internet Research, Vol. 15, Issue 4
  • DOI: 10.2196/jmir.2426

Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)
journal, October 2009

  • Wiegers, Thomas C.; Davis, Allan Peter; Cohen, K. Bretonnel
  • BMC Bioinformatics, Vol. 10, Issue 1
  • DOI: 10.1186/1471-2105-10-326

A CTD-Pfizer collaboration: manual curation of 88 000 scientific articles text mined for drug-disease and drug-phenotype interactions
journal, January 2013


Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge
journal, August 2013


A crowd-sourcing approach for the construction of species-specific cell signaling networks
journal, October 2014


Industrial methodology for process verification in research (IMPROVER): toward systems biology verification
journal, March 2012


Enhancement of COPD biological networks using a web-based collaboration interface
journal, January 2015


KEGG as a reference resource for gene and protein annotation
journal, October 2015

  • Kanehisa, Minoru; Sato, Yoko; Kawashima, Masayuki
  • Nucleic Acids Research, Vol. 44, Issue D1
  • DOI: 10.1093/nar/gkv1070

WikiPathways: capturing the full diversity of pathway knowledge
journal, October 2015

  • Kutmon, Martina; Riutta, Anders; Nunes, Nuno
  • Nucleic Acids Research, Vol. 44, Issue D1
  • DOI: 10.1093/nar/gkv1024

Construction of biological networks from unstructured information based on a semi-automated curation workflow
journal, January 2015


The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification
journal, October 2014

  • Reddy, T. B. K.; Thomas, Alex D.; Stamatis, Dimitri
  • Nucleic Acids Research, Vol. 43, Issue D1
  • DOI: 10.1093/nar/gku950

EXTRACT: interactive extraction of environment metadata and term suggestion for metagenomic sample annotation
journal, January 2016


ENVIRONMENTS and EOL: identification of Environment Ontology terms in text and the annotation of the Encyclopedia of Life
journal, January 2015


Prize-based contests can provide solutions to computational biology problems
journal, February 2013

  • Lakhani, Karim R.; Boudreau, Kevin J.; Loh, Po-Ru
  • Nature Biotechnology, Vol. 31, Issue 2
  • DOI: 10.1038/nbt.2495

Molecular biology gets wikified
journal, July 2008


Crowdsourcing genetic prediction of clinical utility in the Rheumatoid Arthritis Responder Challenge
journal, April 2013

  • Plenge, Robert M.; Greenberg, Jeffrey D.
  • Nature Genetics, Vol. 45, Issue 5
  • DOI: 10.1038/ng.2623

Algorithm discovery by protein folding game players
journal, November 2011

  • Khatib, F.; Cooper, S.; Tyka, M. D.
  • Proceedings of the National Academy of Sciences, Vol. 108, Issue 47
  • DOI: 10.1073/pnas.1115898108

Industrial methodology for process verification in research (IMPROVER): toward systems biology verification
journal, March 2012


A crowd-sourcing approach for the construction of species-specific cell signaling networks
journal, October 2014


A CTD-Pfizer collaboration: manual curation of 88 000 scientific articles text mined for drug-disease and drug-phenotype interactions
journal, January 2013


Hybrid curation of gene–mutation relations combining automated extraction and crowdsourcing
journal, January 2014


Scaling drug indication curation through crowdsourcing
journal, January 2015


EXTRACT: interactive extraction of environment metadata and term suggestion for metagenomic sample annotation
journal, January 2016


A crowdsourcing workflow for extracting chemical-induced disease relations from free text
journal, January 2016


The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification
journal, October 2014

  • Reddy, T. B. K.; Thomas, Alex D.; Stamatis, Dimitri
  • Nucleic Acids Research, Vol. 43, Issue D1
  • DOI: 10.1093/nar/gku950

WikiPathways: capturing the full diversity of pathway knowledge
journal, October 2015

  • Kutmon, Martina; Riutta, Anders; Nunes, Nuno
  • Nucleic Acids Research, Vol. 44, Issue D1
  • DOI: 10.1093/nar/gkv1024

KEGG as a reference resource for gene and protein annotation
journal, October 2015

  • Kanehisa, Minoru; Sato, Yoko; Kawashima, Masayuki
  • Nucleic Acids Research, Vol. 44, Issue D1
  • DOI: 10.1093/nar/gkv1070

Crowdsourcing and Mining Crowd data
conference, November 2014


Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)
journal, October 2009

  • Wiegers, Thomas C.; Davis, Allan Peter; Cohen, K. Bretonnel
  • BMC Bioinformatics, Vol. 10, Issue 1
  • DOI: 10.1186/1471-2105-10-326

An evaluation of GO annotation retrieval for BioCreAtIvE and GOA
journal, January 2005

  • Camon, Evelyn B.; Barrell, Daniel G.; Dimmer, Emily C.
  • BMC Bioinformatics, Vol. 6, Issue Suppl 1
  • DOI: 10.1186/1471-2105-6-s1-s17

Crowdsourcing genomic analyses of ash and ash dieback – power to the people
journal, February 2013


Enhancement of COPD biological networks using a web-based collaboration interface
journal, January 2015


Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing
journal, January 2013

  • Zhai, Haijun; Lingren, Todd; Deleger, Louise
  • Journal of Medical Internet Research, Vol. 15, Issue 4
  • DOI: 10.2196/jmir.2426

Crowdsourcing taste research: genetic and phenotypic predictors of bitter taste perception as a model
journal, May 2014

  • Garneau, Nicole L.; Nuessle, Tiffany M.; Sloan, Meghan M.
  • Frontiers in Integrative Neuroscience, Vol. 8
  • DOI: 10.3389/fnint.2014.00033

Works referencing / citing this record:

Making the right calls in precision oncology
journal, August 2018

  • Bungartz, Kathryn D.; Lalowski, Kristen; Elkin, Sheryl K.
  • Nature Biotechnology, Vol. 36, Issue 8
  • DOI: 10.1038/nbt.4214

Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm
journal, June 2019

  • Pérez-Pérez, Martin; Pérez-Rodríguez, Gael; Blanco-Míguez, Aitor
  • Journal of Cheminformatics, Vol. 11, Issue 1
  • DOI: 10.1186/s13321-019-0363-6

Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements
journal, October 2016

  • Mukherjee, Supratim; Stamatis, Dimitri; Bertsch, Jon
  • Nucleic Acids Research, Vol. 45, Issue D1
  • DOI: 10.1093/nar/gkw992

Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm
journal, June 2019

  • Pérez-Pérez, Martin; Pérez-Rodríguez, Gael; Blanco-Míguez, Aitor
  • Journal of Cheminformatics, Vol. 11, Issue 1
  • DOI: 10.1186/s13321-019-0363-6