Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Multimodal Data Representation with Deep Learning for Extracting Cancer Characteristics from Clinical Text

Conference ·
OSTI ID:1737476
This paper presents a multimodal data representation to improve the performance of deep learning models for extracting cancer key characteristics from unstructured text in pathology reports. Specifically, in addition to using the text as the input to deep learning models, we use concept unique identifiers (CUIs) as another source of information to the models. We analyze the performance of different text and CUI data representations, including word embeddings and bag of embeddings (BOE), with a convolutional neural network (CNN) and a fully connected multilayer perceptron neural network (MLP-NN). The high level document embeddings from text and CUI inputs are combined by concatenating them and then applying a classifier. The model is used for extracting cancer subsite and histology from pathology reports. These two classification tasks have a large number of labels, i.e. 317 for subsite and 556 for histology, with extreme class imbalance. We compare the performance of the developed DL models across the two tasks based on micro- and macro-F1 scores. The evaluation shows that a multi-channel DL model that utilizes text represented by word embeddings and CUIs represented by BOE outperforms other DL models. Also, this approach significantly improves the model performance on low prevalence classes.
Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1737476
Country of Publication:
United States
Language:
English

Similar Records

Information Extraction from Cancer Pathology Reports with Graph Convolution Networks for Natural Language Texts
Conference · Sat Nov 30 23:00:00 EST 2019 · OSTI ID:1606856

Retrofitting Word Embeddings with the UMLS Metathesaurus for Clinical Information Extraction
Conference · Fri Nov 30 23:00:00 EST 2018 · 2018 IEEE International Conference on Big Data (Big Data) · OSTI ID:1567566

Retrofitting Word Embeddings with the UMLS Metathesaurus for Clinical Information Extraction
Conference · Fri Nov 30 23:00:00 EST 2018 · OSTI ID:1491322

Related Subjects