Probing for Artifacts: Detecting Imagenet Model Evasions
Conference
·
OSTI ID:1673321
- BATTELLE (PACIFIC NW LAB)
While deep learning models have made incredible progress across a variety of machine learning tasks, they remain vulnerable to adversarial examples crafted to fool otherwise trustworthy models. In this work we approach this problem through the lens of a detection framework. We propose a classification network that uses the hidden layer activations of a trained model as inputs to detect adversarial artifacts in an input. We train this classification network simultaneously against multiple adversarial algorithms to create a more robust detector and show higher detection rates than several alternatives. The novelty of our approach is in the scale and scope of probing Imagenet models for adversarial artifacts. In addition, we propose an improvement to feature squeezing, another common adversarial example detection method.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1673321
- Report Number(s):
- PNNL-SA-152048
- Country of Publication:
- United States
- Language:
- English
Similar Records
Persistent Classification: Understanding Adversarial Attacks by Studying Decision Boundary Dynamics
Journal Article
·
Mon Jan 20 19:00:00 EST 2025
· Statistical Analysis and Data Mining
·
OSTI ID:2504244