Multimodal Data Representation with Deep Learning for Extracting Cancer Characteristics from Clinical Text
Conference
·
OSTI ID:1737476
- ORNL
- LSUHSC-Louisiana Tumor Registry
- University of Kentucky
- University of Utah
- Rutgers Cancer Institute of New Jersey
- National Cancer Institute, Bethesda, MD
This paper presents a multimodal data representation to improve the performance of deep learning models for extracting cancer key characteristics from unstructured text in pathology reports. Specifically, in addition to using the text as the input to deep learning models, we use concept unique identifiers (CUIs) as another source of information to the models. We analyze the performance of different text and CUI data representations, including word embeddings and bag of embeddings (BOE), with a convolutional neural network (CNN) and a fully connected multilayer perceptron neural network (MLP-NN). The high level document embeddings from text and CUI inputs are combined by concatenating them and then applying a classifier. The model is used for extracting cancer subsite and histology from pathology reports. These two classification tasks have a large number of labels, i.e. 317 for subsite and 556 for histology, with extreme class imbalance. We compare the performance of the developed DL models across the two tasks based on micro- and macro-F1 scores. The evaluation shows that a multi-channel DL model that utilizes text represented by word embeddings and CUIs represented by BOE outperforms other DL models. Also, this approach significantly improves the model performance on low prevalence classes.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1737476
- Country of Publication:
- United States
- Language:
- English
Similar Records
Information Extraction from Cancer Pathology Reports with Graph Convolution Networks for Natural Language Texts
Retrofitting Word Embeddings with the UMLS Metathesaurus for Clinical Information Extraction
Retrofitting Word Embeddings with the UMLS Metathesaurus for Clinical Information Extraction
Conference
·
Sat Nov 30 23:00:00 EST 2019
·
OSTI ID:1606856
Retrofitting Word Embeddings with the UMLS Metathesaurus for Clinical Information Extraction
Conference
·
Fri Nov 30 23:00:00 EST 2018
· 2018 IEEE International Conference on Big Data (Big Data)
·
OSTI ID:1567566
Retrofitting Word Embeddings with the UMLS Metathesaurus for Clinical Information Extraction
Conference
·
Fri Nov 30 23:00:00 EST 2018
·
OSTI ID:1491322