Towards a semantic lexicon for biological language processing
- Karin
It is well understood that natural language processing (NLP) applications require sophisticated lexical resources to support their processing goals. In the biomedical domain, we are privileged to have access to extensive terminological resources in the form of controlled vocabularies and ontologies, which have been integrated into the framework of the National Library of Medicine's Unified Medical Language System's (UMLS) Metathesaurus. However, the existence of such terminological resources does not guarantee their utility for NLP. In particular, we have two core requirements for lexical resources for NLP in addition to the basic enumeration of important domain terms: representation of morphosyntactic information about those terms, specifically part of speech information and inflectional patterns to support parsing and lemma assignment, and representation of semantic information indicating general categorical information about terms, and significant relations between terms to support text understanding and inference (Hahn et at, 1999). Biomedical vocabularies by and large commonly leave out morphosyntactic information, and where they address semantic considerations, they often do so in an unprincipled manner, for instance by indicating a relation between two concepts without indicating the type of that relation. But all is not lost. The UMLS knowledge sources include two additional resources which are relevant - the SPECIALIST lexicon, a lexicon addressing our morphosyntactic requirements, and the Semantic Network, a representation of core conceptual categories in the biomedical domain. The coverage of these two knowledge sources with respect to the full coverage of the Metathesaurus is, however, not entirely clear. Furthermore, when our goals are specifically to process biological text - and often more specifically, text in the molecular biology domain - it is difficult to say whether the coverage of these resources is meaningful. The utility of the UMLS knowledge sources for medical language processing (MLP) has been explored (Johnson, 1999; Friedman et al 2001); the time has now come to repeat these experiments with respect to biological language processing (BLP). To that end, this paper presents an analysis of ihe UMLS resources, specifically with an eye towards constructing lexical resources suitable for BLP. We follow the paradigm presented in Johnson (1999) for medical language, exploring overlap between the UMLS Metathesaurus and SPECIALIST lexicon to construct a morphosyntactic and semantically-specified lexicon, and then further explore the overlap with a relevant domain corpus for molecular biology.
- Research Organization:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- USDOE
- OSTI ID:
- 977640
- Report Number(s):
- LA-UR-04-3190; TRN: US201012%%622
- Resource Relation:
- Journal Volume: 6; Journal Issue: 1-2; Conference: Submitted to: ISMB BioLINK, Glasgow, Scotland, July 29, 2004
- Country of Publication:
- United States
- Language:
- English
Gene Ontology: tool for the unification of biology
|
journal | May 2000 |
How knowledge drives understanding—matching medical ontologies with the needs of medical language processing
|
journal | January 1999 |
A Semantic Lexicon for Medical Language Processing
|
journal | May 1999 |
Identifying named entities from PubMed® for enriching semantic categories
|
journal | February 2015 |
UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text
|
journal | August 2010 |
The BioLexicon: a large-scale terminological resource for biomedical text mining
|
journal | October 2011 |
Ontology quality assurance through analysis of term transformations
|
journal | May 2009 |
Similar Records
A UMLS-based spell checker for natural language processing in vaccine safety
41. DISCOVERY, SEARCH, AND COMMUNICATION OF TEXTUAL KNOWLEDGE RESOURCES IN DISTRIBUTED SYSTEMS a. Discovering and Utilizing Knowledge Sources for Metasearch Knowledge Systems