Adversarial Training for Privacy-Preserving Deep Learning Model Distribution

Alawad, Mohammed; Gao, Shang; Wu, Xiao-Cheng; Durbin, Eric B.; Coyle, Linda; Penberthy, Lynne; Tourassi, Georgia

doi:10.1109/BigData47090.2019.9006131

Adversarial Training for Privacy-Preserving Deep Learning Model Distribution

Conference · Sat Nov 30 23:00:00 EST 2019

DOI:https://doi.org/10.1109/BigData47090.2019.9006131· OSTI ID:1606810

^[1]; Gao, Shang ^[1]; Wu, Xiao-Cheng ^[2]; Durbin, Eric B. ^[3]; Coyle, Linda ^[4]; Penberthy, Lynne ^[4]; ^[1]

ORNL
LSUHSC-Louisiana Tumor Registry
University of Kentucky
National Cancer Institute, Bethesda, MD

Collaboration among cancer registries is essential to develop accurate, robust, and generalizable deep learning models for automated information extraction from cancer pathology reports. Sharing data presents a serious privacy issue, especially in biomedical research and healthcare delivery domains. Distributing pretrained deep learning (DL) models has been proposed to avoid critical data sharing. However, there is growing recognition that collaboration among clinical institutes through DL model distribution exposes new security and privacy vulnerabilities. These vulnerabilities increase in natural language processing (NLP) applications, in which the dataset vocabulary with word vector representations needs to be associated with the other model parameters. In this paper, we propose a novel privacy-preserving DL model distribution across cancer registries for information extraction from cancer pathology reports with privacy and confidentiality considerations. The proposed approach exploits the adversarial training framework to distinguish private features from shared features among different datasets. It only shares registry-invariant model parameters, without sharing raw data nor registry-specific model parameters among cancer registries. Thus, it protects both the data and the trained model simultaneously. We compare our proposed approach to single-registry models, and a model trained on centrally hosted data from different cancer registries. The results show that the proposed approach significantly outperforms the single-registry models and achieves statistically indistinguishable micro and macro F1-score as compared to the centralized model.

View Conference

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 1606810

Country of Publication:: United States

Language:: English

References (14)

Distributed deep learning networks among institutions for medical imaging Chang, Ken; Balachandar, Niranjan; Lam, Carson Journal of the American Medical Informatics Association, Vol. 25, Issue 8 https://doi.org/10.1093/jamia/ocy017	journal	March 2018
De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1 Stubbs, Amber; Filannino, Michele; Uzuner, Özlem Journal of Biomedical Informatics, Vol. 75 https://doi.org/10.1016/j.jbi.2017.06.011	journal	November 2017
Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports Qiu, John X.; Yoon, Hong-Jun; Fearn, Paul A. IEEE Journal of Biomedical and Health Informatics, Vol. 22, Issue 1 https://doi.org/10.1109/JBHI.2017.2700722	journal	January 2018
Deep Learning with Differential Privacy Abadi, Martin; Chu, Andy; Goodfellow, Ian CCS'16: 2016 ACM SIGSAC Conference on Computer and Communications Security, Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security https://doi.org/10.1145/2976749.2978318	conference	October 2016
Deep Transfer Learning Across Cancer Registries for Information Extraction from Pathology Reports Alawad, Mohammed; Gao, Shang; Qiu, John 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI) https://doi.org/10.1109/BHI.2019.8834586	conference	May 2019
Classifying cancer pathology reports with hierarchical self-attention networks Gao, Shang; Qiu, John X.; Alawad, Mohammed Artificial Intelligence in Medicine, Vol. 101 https://doi.org/10.1016/j.artmed.2019.101726	journal	November 2019
Privacy-Preserving Deep Learning via Additively Homomorphic Encryption Phong, Le Trieu; Aono, Yoshinori; Hayashi, Takuya IEEE Transactions on Information Forensics and Security, Vol. 13, Issue 5 https://doi.org/10.1109/TIFS.2017.2787987	journal	May 2018
A privacy-preserving distributed filtering framework for NLP artifacts Sadat, Md Nazmus; Aziz, Md Momin Al; Mohammed, Noman BMC Medical Informatics and Decision Making, Vol. 19, Issue 1 https://doi.org/10.1186/s12911-019-0867-z	journal	September 2019
Coarse-to-fine multi-task training of convolutional neural networks for automated information extraction from cancer pathology reports Alawad, Mohammed; Yoon, Hong-Jun; Tourassi, Georgia D. 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI) https://doi.org/10.1109/BHI.2018.8333408	conference	March 2018
Deep Models Under the GAN Hitaj, Briland; Ateniese, Giuseppe; Perez-Cruz, Fernando Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security https://doi.org/10.1145/3133956.3134012	conference	October 2017
Convolutional Neural Networks for Sentence Classification Kim, Yoon Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) https://doi.org/10.3115/v1/D14-1181	conference	January 2014
De-identification of patient notes with recurrent neural networks Dernoncourt, Franck; Lee, Ji Young; Uzuner, Ozlem Journal of the American Medical Informatics Association, Vol. 24, Issue 3 https://doi.org/10.1093/jamia/ocw156	journal	December 2016
Classifying medical relations in clinical text via convolutional neural networks He, Bin; Guan, Yi; Dai, Rui Artificial Intelligence in Medicine, Vol. 93 https://doi.org/10.1016/j.artmed.2018.05.001	journal	January 2019
Adversarial Multi-task Learning for Text Classification Liu, Pengfei; Qiu, Xipeng; Huang, Xuanjing Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) https://doi.org/10.18653/v1/P17-1001	conference	January 2017

Similar Records

Optimal vocabulary selection approaches for privacy-preserving deep NLP model training for information extraction and cancer epidemiology

Journal Article · Sun Feb 13 23:00:00 EST 2022 · Cancer Biomarkers · OSTI ID:1855683

Privacy-Preserving Knowledge Transfer with Bootstrap Aggregation of Teacher Ensembles

Conference · Sun Feb 28 23:00:00 EST 2021 · OSTI ID:1771902

Adversarial Training for Privacy-Preserving Deep Learning Model Distribution

Citation Formats

References (14)

Similar Records

Related Subjects