Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Proposing New RadLex Terms by Analyzing Free-Text Mammography Reports

Journal Article · · Journal of Digital Imaging (Online)
 [1];  [2];  [3];  [1]
  1. Stanford University, Department of Radiology and Department of Biomedical Data Science, Medical School Office Building (MSOB) (United States)
  2. University of Washington, Department of Radiology, Seattle Cancer Care Alliance (United States)
  3. University of Wisconsin School of Medicine and Public Health, Department of Radiology, E3/311 Clinical Science Center (United States)
After years of development, the RadLex terminology contains a large set of controlled terms for the radiology domain, but gaps still exist. We developed a data-driven approach to discover new terms for RadLex by mining a large corpus of radiology reports using natural language processing (NLP) methods. Our system, developed for mammography, discovers new candidate terms by analyzing noun phrases in free-text reports to extend the mammography part of RadLex. Our NLP system extracts noun phrases from free-text mammography reports and classifies these noun phrases as “Has Candidate RadLex Term” or “Does Not Have Candidate RadLex Term.” We tested the performance of our algorithm using 100 free-text mammography reports. An expert radiologist determined the true positive and true negative RadLex candidate terms. We calculated precision/positive predictive value and recall/sensitivity metrics to judge the system’s performance. Finally, to identify new candidate terms for enhancing RadLex, we applied our NLP method to 270,540 free-text mammography reports obtained from three academic institutions. Our method demonstrated precision/positive predictive value of 0.77 (159/206 terms) and a recall/sensitivity of 0.94 (159/170 terms). The overall accuracy of the system is 0.80 (235/293 terms). When we ran our system on the set of 270,540 reports, it found 31,800 unique noun phrases that are potential candidates for RadLex. Our data-driven approach to mining radiology reports can identify new candidate terms for expanding the breast imaging lexicon portion of RadLex and may be a useful approach for discovering new candidate terms from other radiology domains.
OSTI ID:
22795588
Journal Information:
Journal of Digital Imaging (Online), Journal Name: Journal of Digital Imaging (Online) Journal Issue: 5 Vol. 31; ISSN 1618-727X
Country of Publication:
United States
Language:
English

Similar Records

Discovering Potential Precursors of Mammography Abnormalities based on Textual Features, Frequencies, and Sequences
Conference · Thu Dec 31 23:00:00 EST 2009 · OSTI ID:986826

A UMLS-based spell checker for natural language processing in vaccine safety
Journal Article · Sun Feb 11 19:00:00 EST 2007 · BMC Medical Informatics and Decision Making (Online) · OSTI ID:1626564

RECONCILE: a machine-learning coreference resolution system
Software · Mon Dec 10 00:00:00 EST 2007 · OSTI ID:1304621