skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Adversarial Training for Privacy-Preserving Deep Learning Model Distribution

Abstract

Collaboration among cancer registries is essential to develop accurate, robust, and generalizable deep learning models for automated information extraction from cancer pathology reports. Sharing data presents a serious privacy issue, especially in biomedical research and healthcare delivery domains. Distributing pretrained deep learning (DL) models has been proposed to avoid critical data sharing. However, there is growing recognition that collaboration among clinical institutes through DL model distribution exposes new security and privacy vulnerabilities. These vulnerabilities increase in natural language processing (NLP) applications, in which the dataset vocabulary with word vector representations needs to be associated with the other model parameters. In this paper, we propose a novel privacy-preserving DL model distribution across cancer registries for information extraction from cancer pathology reports with privacy and confidentiality considerations. The proposed approach exploits the adversarial training framework to distinguish private features from shared features among different datasets. It only shares registry-invariant model parameters, without sharing raw data nor registry-specific model parameters among cancer registries. Thus, it protects both the data and the trained model simultaneously. We compare our proposed approach to single-registry models, and a model trained on centrally hosted data from different cancer registries. The results show that the proposed approach significantly outperformsmore » the single-registry models and achieves statistically indistinguishable micro and macro F1-score as compared to the centralized model.« less

Authors:
ORCiD logo [1];  [1];  [2];  [3];  [4];  [4]; ORCiD logo [1]
  1. ORNL
  2. LSUHSC-Louisiana Tumor Registry
  3. University of Kentucky
  4. National Cancer Institute, Bethesda, MD
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1606810
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: 2019 IEEE International Conference on Big Data (IEEE BigData 2019) - Los Angeles, California, United States of America - 12/9/2019 5:00:00 AM-12/12/2019 5:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Alawad, Mohammed, Gao, Shang, Wu, Xiao-Cheng, Durbin, Eric B., Coyle, Linda, Penberthy, Lynne, and Tourassi, Georgia. Adversarial Training for Privacy-Preserving Deep Learning Model Distribution. United States: N. p., 2019. Web. doi:10.1109/BigData47090.2019.9006131.
Alawad, Mohammed, Gao, Shang, Wu, Xiao-Cheng, Durbin, Eric B., Coyle, Linda, Penberthy, Lynne, & Tourassi, Georgia. Adversarial Training for Privacy-Preserving Deep Learning Model Distribution. United States. https://doi.org/10.1109/BigData47090.2019.9006131
Alawad, Mohammed, Gao, Shang, Wu, Xiao-Cheng, Durbin, Eric B., Coyle, Linda, Penberthy, Lynne, and Tourassi, Georgia. 2019. "Adversarial Training for Privacy-Preserving Deep Learning Model Distribution". United States. https://doi.org/10.1109/BigData47090.2019.9006131. https://www.osti.gov/servlets/purl/1606810.
@article{osti_1606810,
title = {Adversarial Training for Privacy-Preserving Deep Learning Model Distribution},
author = {Alawad, Mohammed and Gao, Shang and Wu, Xiao-Cheng and Durbin, Eric B. and Coyle, Linda and Penberthy, Lynne and Tourassi, Georgia},
abstractNote = {Collaboration among cancer registries is essential to develop accurate, robust, and generalizable deep learning models for automated information extraction from cancer pathology reports. Sharing data presents a serious privacy issue, especially in biomedical research and healthcare delivery domains. Distributing pretrained deep learning (DL) models has been proposed to avoid critical data sharing. However, there is growing recognition that collaboration among clinical institutes through DL model distribution exposes new security and privacy vulnerabilities. These vulnerabilities increase in natural language processing (NLP) applications, in which the dataset vocabulary with word vector representations needs to be associated with the other model parameters. In this paper, we propose a novel privacy-preserving DL model distribution across cancer registries for information extraction from cancer pathology reports with privacy and confidentiality considerations. The proposed approach exploits the adversarial training framework to distinguish private features from shared features among different datasets. It only shares registry-invariant model parameters, without sharing raw data nor registry-specific model parameters among cancer registries. Thus, it protects both the data and the trained model simultaneously. We compare our proposed approach to single-registry models, and a model trained on centrally hosted data from different cancer registries. The results show that the proposed approach significantly outperforms the single-registry models and achieves statistically indistinguishable micro and macro F1-score as compared to the centralized model.},
doi = {10.1109/BigData47090.2019.9006131},
url = {https://www.osti.gov/biblio/1606810}, journal = {},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {12}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: