Selective Information Extraction Strategies for Cancer Pathology Reports with Convolutional Neural Networks
Abstract
To trust model predictions, it is important to ensure new data scored by the model comes from the same population used for model training. If the model is used to score new data different than the model’s training data, then predictions and model performance metrics cannot be trusted. Identifying and excluding these anomalous data points is an important task when using models in the real world. Traditional machine learning algorithms and classifiers don’t have the capability to abstain in this case. Here we propose a data-novelty detection algorithm for the Convolutional Neural Network classifier, yielding a rejection score for each new data point scored. It is a post-modeling procedure which examines the distribution of convolution filters to determine if the prediction should be trusted. We apply this algorithm to an information extraction model for a natural language text corpus. We evaluated the algorithm performance using a primary cancer site classification model applied to cancer pathology reports. Results demonstrate that the algorithm is an effective way to exclude cancer pathology reports from model scoring when they do not contain the expected information necessary to accurately classify the primary cancer type.
- Authors:
-
- ORNL
- Publication Date:
- Research Org.:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1509553
- DOE Contract Number:
- AC05-00OR22725
- Resource Type:
- Conference
- Resource Relation:
- Conference: INNS Big Data and Deep Learning 2019 - Genoa, , Italy - 4/16/2019 4:00:00 AM-4/18/2019 4:00:00 AM
- Country of Publication:
- United States
- Language:
- English
Citation Formats
Yoon, Hong-Jun, Qiu, John X., Christian, Blair, Hinkle, Jacob, Alamudun, Folami, and Tourassi, Georgia. Selective Information Extraction Strategies for Cancer Pathology Reports with Convolutional Neural Networks. United States: N. p., 2019.
Web. doi:10.1007/978-3-030-16841-4_9.
Yoon, Hong-Jun, Qiu, John X., Christian, Blair, Hinkle, Jacob, Alamudun, Folami, & Tourassi, Georgia. Selective Information Extraction Strategies for Cancer Pathology Reports with Convolutional Neural Networks. United States. https://doi.org/10.1007/978-3-030-16841-4_9
Yoon, Hong-Jun, Qiu, John X., Christian, Blair, Hinkle, Jacob, Alamudun, Folami, and Tourassi, Georgia. 2019.
"Selective Information Extraction Strategies for Cancer Pathology Reports with Convolutional Neural Networks". United States. https://doi.org/10.1007/978-3-030-16841-4_9. https://www.osti.gov/servlets/purl/1509553.
@article{osti_1509553,
title = {Selective Information Extraction Strategies for Cancer Pathology Reports with Convolutional Neural Networks},
author = {Yoon, Hong-Jun and Qiu, John X. and Christian, Blair and Hinkle, Jacob and Alamudun, Folami and Tourassi, Georgia},
abstractNote = {To trust model predictions, it is important to ensure new data scored by the model comes from the same population used for model training. If the model is used to score new data different than the model’s training data, then predictions and model performance metrics cannot be trusted. Identifying and excluding these anomalous data points is an important task when using models in the real world. Traditional machine learning algorithms and classifiers don’t have the capability to abstain in this case. Here we propose a data-novelty detection algorithm for the Convolutional Neural Network classifier, yielding a rejection score for each new data point scored. It is a post-modeling procedure which examines the distribution of convolution filters to determine if the prediction should be trusted. We apply this algorithm to an information extraction model for a natural language text corpus. We evaluated the algorithm performance using a primary cancer site classification model applied to cancer pathology reports. Results demonstrate that the algorithm is an effective way to exclude cancer pathology reports from model scoring when they do not contain the expected information necessary to accurately classify the primary cancer type.},
doi = {10.1007/978-3-030-16841-4_9},
url = {https://www.osti.gov/biblio/1509553},
journal = {},
issn = {1064-3745},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {4}
}
Works referenced in this record:
On the Exact Variance of Products
journal, December 1960
- Goodman, Leo A.
- Journal of the American Statistical Association, Vol. 55, Issue 292
The 2007 WHO Classification of Tumours of the Central Nervous System
journal, July 2007
- Louis, David N.; Ohgaki, Hiroko; Wiestler, Otmar D.
- Acta Neuropathologica, Vol. 114, Issue 2
Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research
journal, January 2008
- Meystre, S. M.; Savova, G. K.; Kipper-Schuler, K. C.
- Yearbook of Medical Informatics, Vol. 17, Issue 01
Deep learning
journal, May 2015
- LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey
- Nature, Vol. 521, Issue 7553
Filter pruning of Convolutional Neural Networks for text classification: A case study of cancer pathology report comprehension
conference, March 2018
- Yoon, Hong-Jun; Robinson, Sarah; Christian, J. Blair
- 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI)
Convolutional Neural Networks for Sentence Classification
conference, January 2014
- Kim, Yoon
- Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)