skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Crowdsourcing and curation: perspectives from biology and natural language processing

Abstract

Crowdsourcing is increasingly utilized for performing tasks in both natural language processing and biocuration. Although there have been many applications of crowdsourcing in these fields, there have been fewer high-level discussions of the methodology and its applicability to biocuration. This paper explores crowdsourcing for biocuration through several case studies that highlight different ways of leveraging ‘the crowd’; these raise issues about the kind(s) of expertise needed, the motivations of participants, and questions related to feasibility, cost and quality. The paper is an outgrowth of a panel session held at BioCreative V (Seville, September 9–11, 2015). The session consisted of four short talks, followed by a discussion. In their talks, the panelists explored the role of expertise and the potential to improve crowd performance by training; the challenge of decomposing tasks to make them amenable to crowdsourcing; and the capture of biological data and metadata through community editing.

Authors:
 [1];  [2];  [3];  [4];  [5];  [6]
  1. The MITRE Corporation, Bedford, MA (United States)
  2. Univ. of Paris-Sorbonne, Paris (France). STIH Team
  3. Philip Morris Products S.A., Neuchatel (Switzerland). Philip Morris International R&D
  4. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
  5. National Inst. of Health (NIH), Bethesda, MD (United States). National Library of Medicine. National Center for Biotechnology Information
  6. Univ. of Colorado, Denver, CO (United States). School of Medicine
Publication Date:
Research Org.:
USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); The MITRE Corporation, Bedford, MA (United States); National Inst. of Health (NIH), Bethesda, MD (United States); Univ. of Paris-Sorbonne, Paris (France)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23); National Inst. of Health (NIH) (United States); National Science Foundation (NSF); Institute for Research in Computer Science and Automation (INRIA) (France); Ministry of Culture (France); Philip Morris International (United States)
Contributing Org.:
Philip Morris Products S.A., Neuchatel (Switzerland); Univ. of Colorado, Denver, CO (United States)
OSTI Identifier:
1360095
Grant/Contract Number:
SC0010838; R13-GM109648-01A1; 2R01 LM008111-09A1; LM009254-09; 1R01MH096906-01A1; IIS-1207592
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Database
Additional Journal Information:
Journal Volume: 2016; Journal ID: ISSN 1758-0463
Publisher:
Oxford University Press
Country of Publication:
United States
Language:
English
Subject:
96 KNOWLEDGE MANAGEMENT AND PRESERVATION

Citation Formats

Hirschman, Lynette, Fort, Karën, Boué, Stéphanie, Kyrpides, Nikos, Islamaj Doğan, Rezarta, and Cohen, Kevin Bretonnel. Crowdsourcing and curation: perspectives from biology and natural language processing. United States: N. p., 2016. Web. doi:10.1093/database/baw115.
Hirschman, Lynette, Fort, Karën, Boué, Stéphanie, Kyrpides, Nikos, Islamaj Doğan, Rezarta, & Cohen, Kevin Bretonnel. Crowdsourcing and curation: perspectives from biology and natural language processing. United States. doi:10.1093/database/baw115.
Hirschman, Lynette, Fort, Karën, Boué, Stéphanie, Kyrpides, Nikos, Islamaj Doğan, Rezarta, and Cohen, Kevin Bretonnel. Mon . "Crowdsourcing and curation: perspectives from biology and natural language processing". United States. doi:10.1093/database/baw115. https://www.osti.gov/servlets/purl/1360095.
@article{osti_1360095,
title = {Crowdsourcing and curation: perspectives from biology and natural language processing},
author = {Hirschman, Lynette and Fort, Karën and Boué, Stéphanie and Kyrpides, Nikos and Islamaj Doğan, Rezarta and Cohen, Kevin Bretonnel},
abstractNote = {Crowdsourcing is increasingly utilized for performing tasks in both natural language processing and biocuration. Although there have been many applications of crowdsourcing in these fields, there have been fewer high-level discussions of the methodology and its applicability to biocuration. This paper explores crowdsourcing for biocuration through several case studies that highlight different ways of leveraging ‘the crowd’; these raise issues about the kind(s) of expertise needed, the motivations of participants, and questions related to feasibility, cost and quality. The paper is an outgrowth of a panel session held at BioCreative V (Seville, September 9–11, 2015). The session consisted of four short talks, followed by a discussion. In their talks, the panelists explored the role of expertise and the potential to improve crowd performance by training; the challenge of decomposing tasks to make them amenable to crowdsourcing; and the capture of biological data and metadata through community editing.},
doi = {10.1093/database/baw115},
journal = {Database},
number = ,
volume = 2016,
place = {United States},
year = {Mon Aug 08 00:00:00 EDT 2016},
month = {Mon Aug 08 00:00:00 EDT 2016}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:
  • We report that local place names are frequently used by residents living in a geographic region. Such place names may not be recorded in existing gazetteers, due to their vernacular nature, relative insignificance to a gazetteer covering a large area (e.g. the entire world), recent establishment (e.g. the name of a newly-opened shopping center) or other reasons. While not always recorded, local place names play important roles in many applications, from supporting public participation in urban planning to locating victims in disaster response. In this paper, we propose a computational framework for harvesting local place names from geotagged housing advertisements.more » We make use of those advertisements posted on local-oriented websites, such as Craigslist, where local place names are often mentioned. The proposed framework consists of two stages: natural language processing (NLP) and geospatial clustering. The NLP stage examines the textual content of housing advertisements and extracts place name candidates. The geospatial stage focuses on the coordinates associated with the extracted place name candidates and performs multiscale geospatial clustering to filter out the non-place names. We evaluate our framework by comparing its performance with those of six baselines. Finally, we also compare our result with four existing gazetteers to demonstrate the not-yet-recorded local place names discovered by our framework.« less
  • Recent developments in natural-language interfaces between man and computer are reviewed. Particular reference is made to intellect, a product developed by Artificial Intelligence Corp. Intellect translates typed English requests into formal databased query languages, locates and organises the requested information and presents its findings to a user. Even if requests are written in diverse ways, Intellect can respond to them.
  • This paper encompasses two main topics: a broad and general analysis of the issue of performance evaluation of NLP systems and a report on a specific approach developed by the authors and experimented on a sample test case. More precisely, it first presents a brief survey of the major works in the area of NLP systems evaluation. Then, after introducing the notion of the life cycle of an NLP system, it focuses on the concept of performance evaluation and analyzes the scope and the major problems of the investigation. The tools generally used within computer science to assess the qualitymore » of a software system are briefly reviewed, and their applicability to the task of evaluation of NLP systems is discussed. Particular attention is devoted to the concepts of efficiency, correctness, reliability, and adequacy, and how all of them basically fail in capturing the peculiar features of performance evaluation of an NLP system is discussed. Two main approaches to performance evaluation are later introduced; namely, black-box- and model-based, and their most important characteristics are presented. Finally, a specific model for performance evaluation proposed by the authors is illustrated, and the results of an experiment with a sample application are reported. The paper concludes with a discussion on research perspective, open problems, and importance of performance evaluation to industrial applications.« less
  • In principle, natural language and knowledge representation are closely related. This paper investigates this by demonstrating how several natural language phenomena, such as definite reference, ambiguity, ellipsis, ill-formed input, figures of speech, and vagueness, require diverse knowledge sources and reasoning. The breadth of kinds of knowledge needed to represent morphology, syntax, semantics, and pragmatics is surveyed. Furthermore, several current issues in knowledge representation, such as logic versus semantic nets, general-purpose versus special-purpose reasoners, adequacy of first-order logic, wait-and-see strategies, and default reasoning, are illustrated in terms of their relation to natural language processing and how natural language impact the issues.