Crowdsourcing and curation: perspectives from biology and natural language processing
Abstract
Crowdsourcing is increasingly utilized for performing tasks in both natural language processing and biocuration. Although there have been many applications of crowdsourcing in these fields, there have been fewer high-level discussions of the methodology and its applicability to biocuration. This paper explores crowdsourcing for biocuration through several case studies that highlight different ways of leveraging ‘the crowd’; these raise issues about the kind(s) of expertise needed, the motivations of participants, and questions related to feasibility, cost and quality. The paper is an outgrowth of a panel session held at BioCreative V (Seville, September 9–11, 2015). The session consisted of four short talks, followed by a discussion. In their talks, the panelists explored the role of expertise and the potential to improve crowd performance by training; the challenge of decomposing tasks to make them amenable to crowdsourcing; and the capture of biological data and metadata through community editing.
- Authors:
-
- The MITRE Corporation, Bedford, MA (United States)
- Univ. of Paris-Sorbonne, Paris (France). STIH Team
- Philip Morris Products S.A., Neuchatel (Switzerland). Philip Morris International R&D
- USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
- National Inst. of Health (NIH), Bethesda, MD (United States). National Library of Medicine. National Center for Biotechnology Information
- Univ. of Colorado, Denver, CO (United States). School of Medicine
- Publication Date:
- Research Org.:
- USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); The MITRE Corporation, Bedford, MA (United States); National Institutes of Health (NIH), Bethesda, MD (United States); Univ. of Paris-Sorbonne, Paris (France)
- Sponsoring Org.:
- USDOE Office of Science (SC), Biological and Environmental Research (BER); National Inst. of Health (NIH) (United States); National Science Foundation (NSF); Institute for Research in Computer Science and Automation (INRIA) (France); Ministry of Culture (France); Philip Morris International (United States)
- Contributing Org.:
- Philip Morris Products S.A., Neuchatel (Switzerland); Univ. of Colorado, Denver, CO (United States)
- OSTI Identifier:
- 1360095
- Grant/Contract Number:
- SC0010838; R13-GM109648-01A1; 2R01 LM008111-09A1; LM009254-09; 1R01MH096906-01A1; IIS-1207592
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Database
- Additional Journal Information:
- Journal Volume: 2016; Journal ID: ISSN 1758-0463
- Publisher:
- Oxford University Press
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 96 KNOWLEDGE MANAGEMENT AND PRESERVATION
Citation Formats
Hirschman, Lynette, Fort, Karën, Boué, Stéphanie, Kyrpides, Nikos, Islamaj Doğan, Rezarta, and Cohen, Kevin Bretonnel. Crowdsourcing and curation: perspectives from biology and natural language processing. United States: N. p., 2016.
Web. doi:10.1093/database/baw115.
Hirschman, Lynette, Fort, Karën, Boué, Stéphanie, Kyrpides, Nikos, Islamaj Doğan, Rezarta, & Cohen, Kevin Bretonnel. Crowdsourcing and curation: perspectives from biology and natural language processing. United States. https://doi.org/10.1093/database/baw115
Hirschman, Lynette, Fort, Karën, Boué, Stéphanie, Kyrpides, Nikos, Islamaj Doğan, Rezarta, and Cohen, Kevin Bretonnel. Mon .
"Crowdsourcing and curation: perspectives from biology and natural language processing". United States. https://doi.org/10.1093/database/baw115. https://www.osti.gov/servlets/purl/1360095.
@article{osti_1360095,
title = {Crowdsourcing and curation: perspectives from biology and natural language processing},
author = {Hirschman, Lynette and Fort, Karën and Boué, Stéphanie and Kyrpides, Nikos and Islamaj Doğan, Rezarta and Cohen, Kevin Bretonnel},
abstractNote = {Crowdsourcing is increasingly utilized for performing tasks in both natural language processing and biocuration. Although there have been many applications of crowdsourcing in these fields, there have been fewer high-level discussions of the methodology and its applicability to biocuration. This paper explores crowdsourcing for biocuration through several case studies that highlight different ways of leveraging ‘the crowd’; these raise issues about the kind(s) of expertise needed, the motivations of participants, and questions related to feasibility, cost and quality. The paper is an outgrowth of a panel session held at BioCreative V (Seville, September 9–11, 2015). The session consisted of four short talks, followed by a discussion. In their talks, the panelists explored the role of expertise and the potential to improve crowd performance by training; the challenge of decomposing tasks to make them amenable to crowdsourcing; and the capture of biological data and metadata through community editing.},
doi = {10.1093/database/baw115},
journal = {Database},
number = ,
volume = 2016,
place = {United States},
year = {Mon Aug 08 00:00:00 EDT 2016},
month = {Mon Aug 08 00:00:00 EDT 2016}
}
Web of Science
Works referenced in this record:
Crowdsourcing in biomedicine: challenges and opportunities
journal, April 2015
- Khare, Ritu; Good, Benjamin M.; Leaman, Robert
- Briefings in Bioinformatics, Vol. 17, Issue 1
Labeling images with a computer game
conference, January 2004
- von Ahn, Luis; Dabbish, Laura
- Proceedings of the 2004 conference on Human factors in computing systems - CHI '04
Turk-Life in India
conference, January 2014
- Gupta, Neha; Martin, David; Hanrahan, Benjamin V.
- Proceedings of the 18th International Conference on Supporting Group Work - GROUP '14
Algorithm discovery by protein folding game players
journal, November 2011
- Khatib, F.; Cooper, S.; Tyka, M. D.
- Proceedings of the National Academy of Sciences, Vol. 108, Issue 47
Creating Zombilingo , a game with a purpose for dependency syntax annotation
conference, January 2014
- Fort, Karën; Guillaume, Bruno; Chastant, Hadrien
- Proceedings of the First International Workshop on Gamification for Information Retrieval - GamifIR '14
Amazon Mechanical Turk: Gold Mine or Coal Mine?
journal, June 2011
- Fort, Karën; Adda, Gilles; Cohen, K. Bretonnel
- Computational Linguistics, Vol. 37, Issue 2
Hybrid curation of gene–mutation relations combining automated extraction and crowdsourcing
journal, January 2014
- Burger, John D.; Doughty, Emily; Khare, Ritu
- Database, Vol. 2014
Scaling drug indication curation through crowdsourcing
journal, January 2015
- Khare, Ritu; Burger, John D.; Aberdeen, John S.
- Database, Vol. 2015
A crowdsourcing workflow for extracting chemical-induced disease relations from free text
journal, January 2016
- Li, Tong Shu; Bravo, Àlex; Furlong, Laura I.
- Database, Vol. 2016
Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing
journal, January 2013
- Zhai, Haijun; Lingren, Todd; Deleger, Louise
- Journal of Medical Internet Research, Vol. 15, Issue 4
Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)
journal, October 2009
- Wiegers, Thomas C.; Davis, Allan Peter; Cohen, K. Bretonnel
- BMC Bioinformatics, Vol. 10, Issue 1
A CTD-Pfizer collaboration: manual curation of 88 000 scientific articles text mined for drug-disease and drug-phenotype interactions
journal, January 2013
- Davis, A. P.; Wiegers, T. C.; Roberts, P. M.
- Database, Vol. 2013, Issue 0
Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge
journal, August 2013
- Tarca, Adi L.; Lauria, Mario; Unger, Michael
- Bioinformatics, Vol. 29, Issue 22
A crowd-sourcing approach for the construction of species-specific cell signaling networks
journal, October 2014
- Bilal, Erhan; Sakellaropoulos, Theodore; Participants, Challenge
- Bioinformatics, Vol. 31, Issue 4
Industrial methodology for process verification in research (IMPROVER): toward systems biology verification
journal, March 2012
- Meyer, P.; Hoeng, J.; Rice, J. J.
- Bioinformatics, Vol. 28, Issue 9
Enhancement of COPD biological networks using a web-based collaboration interface
journal, January 2015
- Boué, Stéphanie; Fields, Brett; Hoeng, Julia
- F1000Research, Vol. 4
KEGG as a reference resource for gene and protein annotation
journal, October 2015
- Kanehisa, Minoru; Sato, Yoko; Kawashima, Masayuki
- Nucleic Acids Research, Vol. 44, Issue D1
WikiPathways: capturing the full diversity of pathway knowledge
journal, October 2015
- Kutmon, Martina; Riutta, Anders; Nunes, Nuno
- Nucleic Acids Research, Vol. 44, Issue D1
Construction of biological networks from unstructured information based on a semi-automated curation workflow
journal, January 2015
- Szostak, Justyna; Ansari, Sam; Madan, Sumit
- Database, Vol. 2015
The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification
journal, October 2014
- Reddy, T. B. K.; Thomas, Alex D.; Stamatis, Dimitri
- Nucleic Acids Research, Vol. 43, Issue D1
EXTRACT: interactive extraction of environment metadata and term suggestion for metagenomic sample annotation
journal, January 2016
- Pafilis, Evangelos; Buttigieg, Pier Luigi; Ferrell, Barbra
- Database, Vol. 2016
ENVIRONMENTS and EOL: identification of Environment Ontology terms in text and the annotation of the Encyclopedia of Life
journal, January 2015
- Pafilis, Evangelos; Frankild, Sune P.; Schnetzer, Julia
- Bioinformatics, Vol. 31, Issue 11
A comparison of group and individual performance among subject experts and untrained workers at the document retrieval task
journal, January 1998
- Wilbur, W. John
- Journal of the American Society for Information Science, Vol. 49, Issue 6
Prize-based contests can provide solutions to computational biology problems
journal, February 2013
- Lakhani, Karim R.; Boudreau, Kevin J.; Loh, Po-Ru
- Nature Biotechnology, Vol. 31, Issue 2
Crowdsourcing genetic prediction of clinical utility in the Rheumatoid Arthritis Responder Challenge
journal, April 2013
- Plenge, Robert M.; Greenberg, Jeffrey D.
- Nature Genetics, Vol. 45, Issue 5
Algorithm discovery by protein folding game players
journal, November 2011
- Khatib, F.; Cooper, S.; Tyka, M. D.
- Proceedings of the National Academy of Sciences, Vol. 108, Issue 47
Industrial methodology for process verification in research (IMPROVER): toward systems biology verification
journal, March 2012
- Meyer, P.; Hoeng, J.; Rice, J. J.
- Bioinformatics, Vol. 28, Issue 9
A crowd-sourcing approach for the construction of species-specific cell signaling networks
journal, October 2014
- Bilal, Erhan; Sakellaropoulos, Theodore; Participants, Challenge
- Bioinformatics, Vol. 31, Issue 4
A CTD-Pfizer collaboration: manual curation of 88 000 scientific articles text mined for drug-disease and drug-phenotype interactions
journal, January 2013
- Davis, A. P.; Wiegers, T. C.; Roberts, P. M.
- Database, Vol. 2013, Issue 0
Hybrid curation of gene–mutation relations combining automated extraction and crowdsourcing
journal, January 2014
- Burger, John D.; Doughty, Emily; Khare, Ritu
- Database, Vol. 2014
Scaling drug indication curation through crowdsourcing
journal, January 2015
- Khare, Ritu; Burger, John D.; Aberdeen, John S.
- Database, Vol. 2015
EXTRACT: interactive extraction of environment metadata and term suggestion for metagenomic sample annotation
journal, January 2016
- Pafilis, Evangelos; Buttigieg, Pier Luigi; Ferrell, Barbra
- Database, Vol. 2016
A crowdsourcing workflow for extracting chemical-induced disease relations from free text
journal, January 2016
- Li, Tong Shu; Bravo, Àlex; Furlong, Laura I.
- Database, Vol. 2016
The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification
journal, October 2014
- Reddy, T. B. K.; Thomas, Alex D.; Stamatis, Dimitri
- Nucleic Acids Research, Vol. 43, Issue D1
WikiPathways: capturing the full diversity of pathway knowledge
journal, October 2015
- Kutmon, Martina; Riutta, Anders; Nunes, Nuno
- Nucleic Acids Research, Vol. 44, Issue D1
KEGG as a reference resource for gene and protein annotation
journal, October 2015
- Kanehisa, Minoru; Sato, Yoko; Kawashima, Masayuki
- Nucleic Acids Research, Vol. 44, Issue D1
Crowdsourcing and Mining Crowd data
conference, November 2014
- Leaman, Robert; Good, Benjamin M.; su, Andrew I.
- Biocomputing 2015
Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)
journal, October 2009
- Wiegers, Thomas C.; Davis, Allan Peter; Cohen, K. Bretonnel
- BMC Bioinformatics, Vol. 10, Issue 1
An evaluation of GO annotation retrieval for BioCreAtIvE and GOA
journal, January 2005
- Camon, Evelyn B.; Barrell, Daniel G.; Dimmer, Emily C.
- BMC Bioinformatics, Vol. 6, Issue Suppl 1
Crowdsourcing genomic analyses of ash and ash dieback – power to the people
journal, February 2013
- MacLean, Dan; Yoshida, Kentaro; Edwards, Anne
- GigaScience, Vol. 2, Issue 1
Enhancement of COPD biological networks using a web-based collaboration interface
journal, January 2015
- , ; Boué, Stéphanie; Fields, Brett
- F1000Research, Vol. 4
Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing
journal, January 2013
- Zhai, Haijun; Lingren, Todd; Deleger, Louise
- Journal of Medical Internet Research, Vol. 15, Issue 4
Crowdsourcing taste research: genetic and phenotypic predictors of bitter taste perception as a model
journal, May 2014
- Garneau, Nicole L.; Nuessle, Tiffany M.; Sloan, Meghan M.
- Frontiers in Integrative Neuroscience, Vol. 8
Health 2050: The Realization of Personalized Medicine through Crowdsourcing, the Quantified Self, and the Participatory Biocitizen
journal, September 2012
- Swan, Melanie
- Journal of Personalized Medicine, Vol. 2, Issue 3
Strengths and limitations of microarray-based phenotype prediction: Lessons learned from the IMPROVER Diagnostic Signature Challenge
text, January 2013
- Collaborators, Improver Dsc; L., Tarca, Adi; Mario, Lauria,
- ETH Zurich
Works referencing / citing this record:
Making the right calls in precision oncology
journal, August 2018
- Bungartz, Kathryn D.; Lalowski, Kristen; Elkin, Sheryl K.
- Nature Biotechnology, Vol. 36, Issue 8
Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm
journal, June 2019
- Pérez-Pérez, Martin; Pérez-Rodríguez, Gael; Blanco-Míguez, Aitor
- Journal of Cheminformatics, Vol. 11, Issue 1
Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements
journal, October 2016
- Mukherjee, Supratim; Stamatis, Dimitri; Bertsch, Jon
- Nucleic Acids Research, Vol. 45, Issue D1
Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm
journal, June 2019
- Pérez-Pérez, Martin; Pérez-Rodríguez, Gael; Blanco-Míguez, Aitor
- Journal of Cheminformatics, Vol. 11, Issue 1