skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Coarse-to-Fine Multi-Task Training of Convolutional Neural Networks for Automated Information Extraction from Cancer Pathology Reports

Abstract

Information extraction and coding of free-text pathology reports is an important activity for cancer registries to support national cancer surveillance. Cancer registrars must process high volumes of pathology reports on an annual basis. In this study, we investigated an automated approach using a coarse-to-fine training of convolutional neural networks (CNNs) for extracting the primary site, histological grade and laterality from unstructured cancer pathology text reports. Our proposed training scheme consists of two stages. In the first stage, the multi-task learning (MTL) with hard parameter sharing approach is used to train a multi-task MT-CNN model for all the tasks. Then, the TM-CNN parameters are used to initialize a CNN model for each task to be fine trained individually using its corresponding dataset. The performance of our proposed model was compared against a state-of-the-art CNN and the commonly used SVM classifier. We observed that the proposed model consistently outperformed the base line models, especially for the less prevalent classes. Specifically, the proposed training approach achieved a micro-F score of 0.7749 over 12 ICD- O-3 topography codes which is a significant improvement as compared with state-of-the-art CNN (0.7101) and the SVM (0.6019) classifiers. Also, the results demonstrate the potential of the proposed methodmore » for handling class imbalance within each task. It significantly improves macro-F score by 24% and 12% of the primary site and histology grade tasks, respectively.« less

Authors:
ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]
  1. ORNL
Publication Date:
Research Org.:
Oak Ridge National Laboratory, Oak Ridge Leadership Computing Facility (OLCF); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1435267
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: International Conference on Biomedical and Health Informatics - Las Vegas, Nevada, United States of America - 3/4/2018 5:00:00 AM-3/7/2018 5:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Alawad, Mohammed M., Yoon, Hong-Jun, and Tourassi, Georgia. Coarse-to-Fine Multi-Task Training of Convolutional Neural Networks for Automated Information Extraction from Cancer Pathology Reports. United States: N. p., 2018. Web. doi:10.1109/BHI.2018.8333408.
Alawad, Mohammed M., Yoon, Hong-Jun, & Tourassi, Georgia. Coarse-to-Fine Multi-Task Training of Convolutional Neural Networks for Automated Information Extraction from Cancer Pathology Reports. United States. doi:10.1109/BHI.2018.8333408.
Alawad, Mohammed M., Yoon, Hong-Jun, and Tourassi, Georgia. Thu . "Coarse-to-Fine Multi-Task Training of Convolutional Neural Networks for Automated Information Extraction from Cancer Pathology Reports". United States. doi:10.1109/BHI.2018.8333408. https://www.osti.gov/servlets/purl/1435267.
@article{osti_1435267,
title = {Coarse-to-Fine Multi-Task Training of Convolutional Neural Networks for Automated Information Extraction from Cancer Pathology Reports},
author = {Alawad, Mohammed M. and Yoon, Hong-Jun and Tourassi, Georgia},
abstractNote = {Information extraction and coding of free-text pathology reports is an important activity for cancer registries to support national cancer surveillance. Cancer registrars must process high volumes of pathology reports on an annual basis. In this study, we investigated an automated approach using a coarse-to-fine training of convolutional neural networks (CNNs) for extracting the primary site, histological grade and laterality from unstructured cancer pathology text reports. Our proposed training scheme consists of two stages. In the first stage, the multi-task learning (MTL) with hard parameter sharing approach is used to train a multi-task MT-CNN model for all the tasks. Then, the TM-CNN parameters are used to initialize a CNN model for each task to be fine trained individually using its corresponding dataset. The performance of our proposed model was compared against a state-of-the-art CNN and the commonly used SVM classifier. We observed that the proposed model consistently outperformed the base line models, especially for the less prevalent classes. Specifically, the proposed training approach achieved a micro-F score of 0.7749 over 12 ICD- O-3 topography codes which is a significant improvement as compared with state-of-the-art CNN (0.7101) and the SVM (0.6019) classifiers. Also, the results demonstrate the potential of the proposed method for handling class imbalance within each task. It significantly improves macro-F score by 24% and 12% of the primary site and histology grade tasks, respectively.},
doi = {10.1109/BHI.2018.8333408},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {3}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: