Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Privacy-Preserving Knowledge Transfer with Bootstrap Aggregation of Teacher Ensembles

Conference ·

There is a need to transfer knowledge among institutions and organizations to save effort in annotation and labeling or in enhancing task performance. However, knowledge transfer is difficult because of restrictions that are in place to ensure data security and privacy. Institutions are not allowed to exchange data or perform any activity that may expose personal information. With the leverage of a differential privacy algorithm in a high-performance computing environment, we propose a new training protocol, Bootstrap Aggregation of Teacher Ensembles (BATE), which is applicable to various types of machine learning models. The BATE algorithm is based on and provides enhancements to the PATE algorithm, maintaining competitive task performance scores on complex datasets with underrepresented class labels.We conducted a proof-of-the-concept study of the information extraction from cancer pathology report data from four cancer registries and performed comparisons between four scenarios: no collaboration, no privacy-preserving collaboration, the PATE algorithm, and the proposed BATE algorithm. The results showed that the BATE algorithm maintained competitive macro-averaged F1 scores, demonstrating that the suggested algorithm is an effective yet privacy-preserving method for machine learning and deep learning solutions.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1771902
Country of Publication:
United States
Language:
English

References (9)

Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports journal January 2018
Deep Learning with Differential Privacy
  • Abadi, Martin; Chu, Andy; Goodfellow, Ian
  • CCS'16: 2016 ACM SIGSAC Conference on Computer and Communications Security, Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security https://doi.org/10.1145/2976749.2978318
conference October 2016
Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks journal November 2019
Privacy-preserving data publishing: A survey of recent developments journal June 2010
Enhance PATE on Complex Tasks With Knowledge Transferred From Non-Private Data journal January 2019
Convolutional Neural Networks for Sentence Classification conference January 2014
SoK: Security and Privacy in Machine Learning conference April 2018
Differential Privacy: A Survey of Results conference January 2008
On the Protection of Private Information in Machine Learning Systems: Two Recent Approches conference August 2017

Similar Records

Adversarial Training for Privacy-Preserving Deep Learning Model Distribution
Conference · Sat Nov 30 23:00:00 EST 2019 · OSTI ID:1606810

Optimal vocabulary selection approaches for privacy-preserving deep NLP model training for information extraction and cancer epidemiology
Journal Article · Sun Feb 13 23:00:00 EST 2022 · Cancer Biomarkers · OSTI ID:1855683

Related Subjects