skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Multi-Task Convolutional Neural Networks for Natural Text Classification

Abstract

Information extraction and coding of free-text pathology reports is an important activity for cancer registries to support national cancer surveillance. Cancer registrars must process high volumes of pathology reports on an annual basis. We investigated an automated approach using a convolutional neural networks (CNNs) for extracting the primary site, laterality, histology, behavior, and histological grade from unstructured cancer pathology text reports. Our multi-task learning (MTL) with hard parameter sharing approach is used to train a multi-task MT-CNN model for all the tasks. The performance of our proposed approach was compared against a state of-the-art CNN and the commonly used SVM classifier. We observed that the proposed model consistently outperformed the baseline models, especially for the less prevalent classes. Also, the results demonstrate the potential of the proposed method for handling class imbalance within each task.

Authors:
 [1];  [1];  [1]
  1. Oak Ridge National Laboratory
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1503170
Report Number(s):
Multi-Task Convolutional Neural Networks for Natur; 005841WKSTN00
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Software
Software Revision:
00
Software Package Number:
005841
Software CPU:
WKSTN
Open Source:
Yes
Source Code Available:
Yes
Country of Publication:
United States

Citation Formats

Yoon, Hong Jun, Alawad, Mohammed M, and Tourassi, Georgia. Multi-Task Convolutional Neural Networks for Natural Text Classification. Computer software. https://www.osti.gov//servlets/purl/1503170. Vers. 00. USDOE. 5 Feb. 2018. Web.
Yoon, Hong Jun, Alawad, Mohammed M, & Tourassi, Georgia. (2018, February 5). Multi-Task Convolutional Neural Networks for Natural Text Classification (Version 00) [Computer software]. https://www.osti.gov//servlets/purl/1503170.
Yoon, Hong Jun, Alawad, Mohammed M, and Tourassi, Georgia. Multi-Task Convolutional Neural Networks for Natural Text Classification. Computer software. Version 00. February 5, 2018. https://www.osti.gov//servlets/purl/1503170.
@misc{osti_1503170,
title = {Multi-Task Convolutional Neural Networks for Natural Text Classification, Version 00},
author = {Yoon, Hong Jun and Alawad, Mohammed M and Tourassi, Georgia},
abstractNote = {Information extraction and coding of free-text pathology reports is an important activity for cancer registries to support national cancer surveillance. Cancer registrars must process high volumes of pathology reports on an annual basis. We investigated an automated approach using a convolutional neural networks (CNNs) for extracting the primary site, laterality, histology, behavior, and histological grade from unstructured cancer pathology text reports. Our multi-task learning (MTL) with hard parameter sharing approach is used to train a multi-task MT-CNN model for all the tasks. The performance of our proposed approach was compared against a state of-the-art CNN and the commonly used SVM classifier. We observed that the proposed model consistently outperformed the baseline models, especially for the less prevalent classes. Also, the results demonstrate the potential of the proposed method for handling class imbalance within each task.},
url = {https://www.osti.gov//servlets/purl/1503170},
doi = {},
year = {2018},
month = {2},
note =
}