skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Automated Histologic Grading from Free-Text Pathology Reports Using Graph-of-Words Features and Machine Learning

Authors:
 [1];  [1];  [1]
  1. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
Work for Others (WFO)
OSTI Identifier:
1340458
DOE Contract Number:
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: BHI-2017 International Conference on Biomedical and Health Informatics, Orlando, FL, USA, 20170216, 20170216
Country of Publication:
United States
Language:
English

Citation Formats

Yoon, Hong-Jun, Roberts, Larry W, and Tourassi, Georgia. Automated Histologic Grading from Free-Text Pathology Reports Using Graph-of-Words Features and Machine Learning. United States: N. p., 2017. Web.
Yoon, Hong-Jun, Roberts, Larry W, & Tourassi, Georgia. Automated Histologic Grading from Free-Text Pathology Reports Using Graph-of-Words Features and Machine Learning. United States.
Yoon, Hong-Jun, Roberts, Larry W, and Tourassi, Georgia. Sun . "Automated Histologic Grading from Free-Text Pathology Reports Using Graph-of-Words Features and Machine Learning". United States. doi:. https://www.osti.gov/servlets/purl/1340458.
@article{osti_1340458,
title = {Automated Histologic Grading from Free-Text Pathology Reports Using Graph-of-Words Features and Machine Learning},
author = {Yoon, Hong-Jun and Roberts, Larry W and Tourassi, Georgia},
abstractNote = {},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Sun Jan 01 00:00:00 EST 2017},
month = {Sun Jan 01 00:00:00 EST 2017}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • Pathology reports are a primary source of information for cancer registries which process high volumes of free-text reports annually. Information extraction and coding is a manual, labor-intensive process. Here in this study we investigated deep learning and a convolutional neural network (CNN), for extracting ICDO- 3 topographic codes from a corpus of breast and lung cancer pathology reports. We performed two experiments, using a CNN and a more conventional term frequency vector approach, to assess the effects of class prevalence and inter-class transfer learning. The experiments were based on a set of 942 pathology reports with human expert annotations asmore » the gold standard. CNN performance was compared against a more conventional term frequency vector space approach. We observed that the deep learning models consistently outperformed the conventional approaches in the class prevalence experiment, resulting in micro and macro-F score increases of up to 0.132 and 0.226 respectively when class labels were well populated. Specifically, the best performing CNN achieved a micro-F score of 0.722 over 12 ICD-O-3 topography codes. Transfer learning provided a consistent but modest performance boost for the deep learning methods but trends were contingent on CNN method and cancer site. Finally, these encouraging results demonstrate the potential of deep learning for automated abstraction of pathology reports.« less
  • Building on technical advances from the BioNLP 2009 Shared Task Challenge, the 2011 challenge sets forth to generalize techniques to other complex biological event extraction tasks. In this paper, we present the implementation and evaluation of a signature-based machine-learning technique to predict events from full texts of infectious disease documents. Specifically, our approach uses novel signatures composed of traditional linguistic features and semantic knowledge to predict event triggers and their candidate arguments. Using a leave-one out analysis, we report the contribution of linguistic and shallow semantic features in the trigger prediction and candidate argument extraction. Lastly, we examine evaluations andmore » posit causes for errors of infectious disease track subtasks.« less
  • Various computer-assisted technologies have been developed to assist radiologists in detecting cancer; however, the algorithms still lack high degrees of sensitivity and specificity, and must undergo machine learning against a training set with known pathologies in order to further refine the algorithms with higher validity of truth. This work describes an approach to learning cue phrase patterns in radiology reports that utilizes a genetic algorithm (GA) as the learning method. The approach described here successfully learned cue phrase patterns for two distinct classes of radiology reports. These patterns can then be used as a basis for automatically categorizing, clustering, ormore » retrieving relevant data for the user.« less