Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks

Alawad, Mohammed; Gao, Shang; Qiu, John X.; Yoon, Hong Jun; Blair Christian, J.; Penberthy, Lynne; Mumphrey, Brent; Wu, Xiao-Cheng; Coyle, Linda; Tourassi, Georgia

doi:10.1093/jamia/ocz153

Title: Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks

Journal Article · Sat Nov 09 04:00:00 UTC 2019 · Journal of the American Medical Informatics Association

DOI: https://doi.org/10.1093/jamia/ocz153 · OSTI ID:1574032

Alawad, Mohammed ^[1]; Gao, Shang ^[1]; Qiu, John X. ^[1]; Yoon, Hong Jun ^[1]; Blair Christian, J. ^[1]; Penberthy, Lynne ^[2]; Mumphrey, Brent ^[3]; Wu, Xiao-Cheng ^[3]; Coyle, Linda ^[4]; Tourassi, Georgia ^[1]

Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, Maryland, USA
Louisiana Tumor Registry, Louisiana State University Health Sciences Center School of Public Health, New Orleans, Louisiana, USA
Information Management Services Inc, Calverton, Maryland, USA

Objective: In this work we implement 2 different multitask learning (MTL) techniques, hard parameter sharing and cross-stitch, to train a word-level convolutional neural network (CNN) specifically designed for automatic extraction of cancer data from unstructured text in pathology reports. We show the importance of learning related information extraction (IE) tasks leveraging shared representations across the tasks to achieve state-of-the-art performance in classification accuracy and computational efficiency. Materials and Methods: Multitask CNN (MTCNN) attempts to tackle document information extraction by learning to extract multiple key cancer characteristics simultaneously. We trained our MTCNN to perform 5 information extraction tasks: (1) primary cancer site (65 classes), (2) laterality (4 classes), (3) behavior (3 classes), (4) histological type (63 classes), and (5) histological grade (5 classes). We evaluated the performance on a corpus of 95 231 pathology documents (71 223 unique tumors) obtained from the Louisiana Tumor Registry. We compared the performance of the MTCNN models against single-task CNN models and 2 traditional machine learning approaches, namely support vector machine (SVM) and random forest classifier (RFC). Results: MTCNNs offered superior performance across all 5 tasks in terms of classification accuracy as compared with the other machine learning models. Based on retrospective evaluation, the hard parameter sharing and cross-stitch MTCNN models correctly classified 59.04% and 57.93% of the pathology reports respectively across all 5 tasks. The baseline models achieved 53.68% (CNN), 46.37% (RFC), and 36.75% (SVM). Based on prospective evaluation, the percentages of correctly classified cases across the 5 tasks were 60.11% (hard parameter sharing), 58.13% (cross-stitch), 51.30% (single-task CNN), 42.07% (RFC), and 35.16% (SVM). Moreover, hard parameter sharing MTCNNs outperformed the other models in computational efficiency by using about the same number of trainable parameters as a single-task CNN. Conclusions: The hard parameter sharing MTCNN offers superior classification accuracy for automated coding support of pathology documents across a wide range of cancers and multiple information extraction tasks while maintaining similar training and inference time as those of a single task–specific model.

View Journal Article

Cite

Export

Save

Sponsoring Organization:: National Cancer Institute (NCI); National Institutes of Health (NIH); USDOE Office of Science (SC)

Grant/Contract Number:: AC05-00OR22725; AC02-06CH11357; AC52-07NA27344; AC52-06NA25396

OSTI ID:: 1574032

Journal Information:: Journal of the American Medical Informatics Association, Journal Name: Journal of the American Medical Informatics Association; ISSN 1067-5027

Publisher:: Oxford University PressCopyright Statement

Country of Publication:: United Kingdom

Language:: English

References (10)

Using machine learning to parse breast pathology reports Yala, Adam; Barzilay, Regina; Salama, Laura Breast Cancer Research and Treatment, Vol. 161, Issue 2 https://doi.org/10.1007/s10549-016-4035-1	journal	November 2016
Text mining of cancer-related information: Review of current status and future directions Spasić, Irena; Livsey, Jacqueline; Keane, John A. International Journal of Medical Informatics, Vol. 83, Issue 9 https://doi.org/10.1016/j.ijmedinf.2014.06.009	journal	September 2014
A systematic analysis of performance measures for classification tasks Sokolova, Marina; Lapalme, Guy Information Processing & Management, Vol. 45, Issue 4 https://doi.org/10.1016/j.ipm.2009.03.002	journal	July 2009
Natural Language Processing methods and systems for biomedical ontology learning Liu, Kaihong; Hogan, William R.; Crowley, Rebecca S. Journal of Biomedical Informatics, Vol. 44, Issue 1 https://doi.org/10.1016/j.jbi.2010.07.006	journal	February 2011
Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review Kreimeyer, Kory; Foster, Matthew; Pandey, Abhishek Journal of Biomedical Informatics, Vol. 73 https://doi.org/10.1016/j.jbi.2017.07.012	journal	September 2017
Clinical information extraction applications: A literature review Wang, Yanshan; Wang, Liwei; Rastegar-Mojarad, Majid Journal of Biomedical Informatics, Vol. 77 https://doi.org/10.1016/j.jbi.2017.11.011	journal	January 2018
Hierarchical attention networks for information extraction from cancer pathology reports Gao, Shang; Young, Michael T.; Qiu, John X. Journal of the American Medical Informatics Association, Vol. 25, Issue 3 https://doi.org/10.1093/jamia/ocx131	journal	November 2017
Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports Qiu, John X.; Yoon, Hong-Jun; Fearn, Paul A. IEEE Journal of Biomedical and Health Informatics, Vol. 22, Issue 1 https://doi.org/10.1109/JBHI.2017.2700722	journal	January 2018
Recent Trends in Deep Learning Based Natural Language Processing [Review Article] Young, Tom; Hazarika, Devamanyu; Poria, Soujanya IEEE Computational Intelligence Magazine, Vol. 13, Issue 3 https://doi.org/10.1109/MCI.2018.2840738	journal	August 2018
The feasibility of using natural language processing to extract clinical information from breast pathology reports Hughes, KevinS; Buckley, JullietteM; Coopey, SuzanneB Journal of Pathology Informatics, Vol. 3, Issue 1 https://doi.org/10.4103/2153-3539.97788	journal	January 2012

Similar Records

Multi-Task Convolutional Neural Networks for Natural Text Classification

Software · Mon Feb 05 00:00:00 UTC 2018 · OSTI ID:code-45723

Yoon, Hong Jun; Alawad, Mohammed M; Tourassi, Georgia

Coarse-to-Fine Multi-Task Training of Convolutional Neural Networks for Automated Information Extraction from Cancer Pathology Reports

Conference · Thu Mar 01 04:00:00 UTC 2018 · OSTI ID:1435267

Alawad, Mohammed; Yoon, Hong-Jun; Tourassi, Georgia

Retrofitting Word Embeddings with the UMLS Metathesaurus for Clinical Information Extraction

Conference · Sat Dec 01 04:00:00 UTC 2018 · 2018 IEEE International Conference on Big Data (Big Data) · OSTI ID:1567566

Alawad, Mohammed; Hasan, S.M. Shamimul; Blair Christian, J.; +1 more

Title: Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks

Citation Formats

References (10)

Similar Records

Related Subjects