Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Probing for Artifacts: Detecting Imagenet Model Evasions

Conference ·
OSTI ID:1673321
While deep learning models have made incredible progress across a variety of machine learning tasks, they remain vulnerable to adversarial examples crafted to fool otherwise trustworthy models. In this work we approach this problem through the lens of a detection framework. We propose a classification network that uses the hidden layer activations of a trained model as inputs to detect adversarial artifacts in an input. We train this classification network simultaneously against multiple adversarial algorithms to create a more robust detector and show higher detection rates than several alternatives. The novelty of our approach is in the scale and scope of probing Imagenet models for adversarial artifacts. In addition, we propose an improvement to feature squeezing, another common adversarial example detection method.
Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
1673321
Report Number(s):
PNNL-SA-152048
Country of Publication:
United States
Language:
English

Similar Records

Persistent Classification: Understanding Adversarial Attacks by Studying Decision Boundary Dynamics
Journal Article · Mon Jan 20 19:00:00 EST 2025 · Statistical Analysis and Data Mining · OSTI ID:2504244

Related Subjects