Classifying Cancer Pathology Reports with Hierarchical Self-Attention Networks
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Health Data Sciences Institute, Computational Sciences and Engineering Division
- National Cancer Institute, Bethesda, MD (United States). Division of Cancer Control and Population Sciences, Surveillance Informatics Branch
- Louisiana State Univ., New Orleans, LA (United States). School of Public Health, Health Sciences Center, Louisiana Tumor Registry
- Information Management Services, Inc., Calverton, MD (United States)
We introduce a deep learning architecture, hierarchical self-attention networks (HiSANs), designed for classifying pathology reports and show how its unique architecture leads to a new state-of-the-art in accuracy, faster training, and clear interpretability. We evaluate performance on a corpus of 374,899 pathology reports obtained from the National Cancer Institute's (NCI) Surveillance, Epidemiology, and End Results (SEER) program. Each pathology report is associated with five clinical classification tasks – site, laterality, behavior, histology, and grade. We compare the performance of the HiSAN against other machine learning and deep learning approaches commonly used on medical text data – Naive Bayes, logistic regression, convolutional neural networks, and hierarchical attention networks (the previous state-of-the-art). We show that HiSANs are superior to other machine learning and deep learning text classifiers in both accuracy and macro F-score across all five classification tasks. Compared to the previous state-of-the-art, hierarchical attention networks, HiSANs not only are an order of magnitude faster to train, but also achieve about 1% better relative accuracy and 5% better relative macro F-score.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC)
- Grant/Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1785219
- Journal Information:
- Artificial Intelligence in Medicine, Journal Name: Artificial Intelligence in Medicine Vol. 101; ISSN 0933-3657
- Publisher:
- ElsevierCopyright Statement
- Country of Publication:
- United States
- Language:
- English