Retrofitting Word Embeddings with the UMLS Metathesaurus for Clinical Information Extraction
- ORNL
Deep learning has surged in popularity and proven to be effective for various artificial intelligence appli- cations including information extraction from cancer pathol- ogy reports. Since word representation is a core unit that enables deep learning algorithms to understand words and be able to perform NLP, this representation must include as much information as possible to help these algorithms achieve high classification performance. Therefore, in this work in addition to the distributional information of words in large sized corpora, we use UMLS vocabulary resources to enrich the vector space representation of words with the semantic relations between words. These resources provide many terminologies pertaining to cancer. The refined word embeddings are used with a convolutional neural (CNN) model to extract four data elements from cancer pathology reports; ICD-O-3 tumor topography codes, tumor laterality, behavior, and histological grade. We observed that using UMLS vocabulary resources to enrich word embeddings of CNN models consistently outperformed CNN models without pre- training word embeddings and even with pre-trained word embeddings on a domain specific corpus across all four tasks. The results show marginal improvement on the laterality task, but a significant improvement on the other tasks, especially for the macro-f score. Specifically, the improvements are 3%, 13%, and 15% for tumor site, histological grade, and behavior tasks, respectively. This approach is encouraging to enrich word embeddings with more clinical data resources to be used for information abstraction tasks from clinical pathology reports.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1491322
- Country of Publication:
- United States
- Language:
- English
An Introduction to the Bootstrap
|
book | May 1994 |
Neural Machine Translation of Rare Words with Subword Units
|
conference | January 2016 |
Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports
|
journal | January 2018 |
Automatic lymphoma classification with sentence subgraph mining from pathology reports
|
journal | January 2014 |
Deep learning for stock market prediction from financial news articles
|
conference | June 2017 |
Improving Lexical Embeddings with Semantic Knowledge
|
conference | January 2014 |
Text mining of cancer-related information: Review of current status and future directions
|
journal | September 2014 |
Retrofitting Word Vectors to Semantic Lexicons
|
conference | January 2015 |
Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model
|
journal | October 2009 |
The Unified Medical Language System
|
journal | January 1993 |
Convolutional Neural Networks for Sentence Classification
|
conference | January 2014 |
Automatic ICD-10 classification of cancers from free-text death certificates
|
journal | November 2015 |
Similar Records
Retrofitting Word Embeddings with the UMLS Metathesaurus for Clinical Information Extraction
Coarse-to-Fine Multi-Task Training of Convolutional Neural Networks for Automated Information Extraction from Cancer Pathology Reports
Conference
·
Fri Nov 30 23:00:00 EST 2018
· 2018 IEEE International Conference on Big Data (Big Data)
·
OSTI ID:1567566
Coarse-to-Fine Multi-Task Training of Convolutional Neural Networks for Automated Information Extraction from Cancer Pathology Reports
Conference
·
Wed Feb 28 23:00:00 EST 2018
·
OSTI ID:1435267