Using ensembles and distillation to optimize the deployment of deep learning models for the classification of electronic cancer pathology reports
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Univ. of Tennessee, Knoxville, TN (United States)
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Univ. of Kentucky, Lexington, KY (United States)
- Louisiana State Univ., New Orleans, LA (United States)
- Rutgers Univ., New Brunswick, NJ (United States)
- Univ. of Utah, Salt Lake City, UT (United States)
- Univ. of Washington, Seattle, WA (United States)
- Univ. of New Mexico, Albuquerque, NM (United States)
- Information Management Services, Inc., Calverton, MD (United States)
- National Cancer Institute, Bethesda, MD (United States)
One of the goals of the Surveillance, Epidemiology, and End Results (SEER) program is to estimate incidence, prevalence, and mortality of all cancers. To that end, cancer registries across the country maintain a massive database of cancer pathology reports which contain rich information to understand cancer trends. However, these reports are stored in the form of unstructured text, and human annotators are required to read and extract relevant information. In this article, we show that existing deep learning models for automating information extraction from cancer pathology reports can be significantly improved by using ensemble model distillation. We found that by training multiple predictive models and transferring their knowledge to a single, low-resource model, we can reduce the number of highly confident wrong predictions. Our results show that our implemented methods could save 1000s of manual annotation hours.
- Research Organization:
- Argonne National Laboratory (ANL), Argonne, IL (United States); Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States); Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
- Sponsoring Organization:
- Centers for Disease Control and Prevention (CDC); NCI Surveillance, Epidemiology and End Results (SEER); National Institutes of Health (NIH); USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC)
- Grant/Contract Number:
- AC02-06CH11357; AC05-00OR22725; AC52-06NA25396; AC52-07NA27344
- OSTI ID:
- 1887696
- Journal Information:
- JAMIA Open, Journal Name: JAMIA Open Journal Issue: 3 Vol. 5; ISSN 2574-2531
- Publisher:
- Oxford University PressCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
FrESCO: Framework for Exploring Scalable Computational Oncology
Large-scale deep learning for metastasis detection in pathology reports