skip to main content

DOE PAGESDOE PAGES

Title: Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges

Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system ‘accuracy’ remains a challenge and identify several additional common difficulties and potential research directions including (i) the ‘scalability’ issue due to the increasing need of mining information from millions of full-text articles, (ii) the ‘interoperability’ issue of integrating various text-mining systems into existing curation workflows and (iii) the ‘reusability’ issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. In conclusion, in order to sustain the curation ecosystem and havemore » text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators.« less
Authors:
 [1] ;  [1] ;  [2] ;  [3] ;  [4] ;  [5] ;  [6] ;  [7] ;  [1]
  1. National Inst. of Health (NIH), Bethesda, MD (United States). National Library of Medicine. National Center for Biotechnology Information
  2. Selventa, Cambridge, MA (United States)
  3. European Molecular Biology Organization (EMBO), Heidelberg (Germany)
  4. European Molecular Biology Lab. (EMBL), Hinxton (United Kingdom). European Bioinformatics Inst. (EMBL-EBI)
  5. Univ. of Delaware, Newark, DE (United States). Delaware Biotechnology Inst. Dept. of Computer and Information Sciences. Center for Bioinformatics and Computational Biology
  6. SIB Swiss Inst. of Bioinformatics, Lausanne (Switzerland)
  7. National Inst. of Health (NIH), Bethesda, MD (United States). National Library of Medicine. National Center for Biotechnology Information; Univ. of Delaware, Newark, DE (United States). Delaware Biotechnology Inst. Dept. of Computer and Information Sciences. Center for Bioinformatics and Computational Biology
Publication Date:
Grant/Contract Number:
SC0010838; R13-GM109648-01A1; P20-GM103446; DBI-1356374; 098231/Z/12/Z
Type:
Accepted Manuscript
Journal Name:
Database
Additional Journal Information:
Journal Volume: 2016; Journal Issue: 0; Journal ID: ISSN 1758-0463
Publisher:
Oxford University Press
Research Org:
Univ. of Delaware, Newark, DE (United States); National Inst. of Health (NIH), Bethesda, MD (United States)
Sponsoring Org:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23); National Inst. of Health (NIH) (United States); National Science Foundation (NSF); The Robert Bosch Foundation (Germany); European Molecular Biology Organization (EMBO) (Germany); Wellcome Trust (United Kingdom)
Contributing Orgs:
Selventa, Cambridge, MA (United States); SIB Swiss Inst. of Bioinformatics, Lausanne (Switzerland); European Molecular Biology Lab. (EMBL), Hinxton (United Kingdom); European Molecular Biology Organization (EMBO), Heidelberg (Germany)
Country of Publication:
United States
Language:
English
Subject:
96 KNOWLEDGE MANAGEMENT AND PRESERVATION; 59 BASIC BIOLOGICAL SCIENCES
OSTI Identifier:
1360097

Singhal, Ayush, Leaman, Robert, Catlett, Natalie, Lemberger, Thomas, McEntyre, Johanna, Polson, Shawn, Xenarios, Ioannis, Arighi, Cecilia, and Lu, Zhiyong. Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges. United States: N. p., Web. doi:10.1093/database/baw161.
Singhal, Ayush, Leaman, Robert, Catlett, Natalie, Lemberger, Thomas, McEntyre, Johanna, Polson, Shawn, Xenarios, Ioannis, Arighi, Cecilia, & Lu, Zhiyong. Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges. United States. doi:10.1093/database/baw161.
Singhal, Ayush, Leaman, Robert, Catlett, Natalie, Lemberger, Thomas, McEntyre, Johanna, Polson, Shawn, Xenarios, Ioannis, Arighi, Cecilia, and Lu, Zhiyong. 2016. "Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges". United States. doi:10.1093/database/baw161. https://www.osti.gov/servlets/purl/1360097.
@article{osti_1360097,
title = {Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges},
author = {Singhal, Ayush and Leaman, Robert and Catlett, Natalie and Lemberger, Thomas and McEntyre, Johanna and Polson, Shawn and Xenarios, Ioannis and Arighi, Cecilia and Lu, Zhiyong},
abstractNote = {Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system ‘accuracy’ remains a challenge and identify several additional common difficulties and potential research directions including (i) the ‘scalability’ issue due to the increasing need of mining information from millions of full-text articles, (ii) the ‘interoperability’ issue of integrating various text-mining systems into existing curation workflows and (iii) the ‘reusability’ issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. In conclusion, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators.},
doi = {10.1093/database/baw161},
journal = {Database},
number = 0,
volume = 2016,
place = {United States},
year = {2016},
month = {12}
}