Probing for Artifacts: Detecting Imagenet Model Evasions

Rounds, Jeremiah; Kingsland, Addie J.; Henry, Michael J.; Duskin, Kayla R.

Probing for Artifacts: Detecting Imagenet Model Evasions

Conference · Tue Jul 28 04:00:00 EDT 2020

OSTI ID:1673321

Rounds, Jeremiah ^[1]; Kingsland, Addie J. ^[1]; Henry, Michael J. ^[1]; Duskin, Kayla R. ^[1]

BATTELLE (PACIFIC NW LAB)

While deep learning models have made incredible progress across a variety of machine learning tasks, they remain vulnerable to adversarial examples crafted to fool otherwise trustworthy models. In this work we approach this problem through the lens of a detection framework. We propose a classification network that uses the hidden layer activations of a trained model as inputs to detect adversarial artifacts in an input. We train this classification network simultaneously against multiple adversarial algorithms to create a more robust detector and show higher detection rates than several alternatives. The novelty of our approach is in the scale and scope of probing Imagenet models for adversarial artifacts. In addition, we propose an improvement to feature squeezing, another common adversarial example detection method.

🛈

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Research Organization:: Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-76RL01830

OSTI ID:: 1673321

Report Number(s):: PNNL-SA-152048

Country of Publication:: United States

Language:: English

Similar Records

Persistent Classification: Understanding Adversarial Attacks by Studying Decision Boundary Dynamics

Journal Article · Mon Jan 20 19:00:00 EST 2025 · Statistical Analysis and Data Mining · OSTI ID:2504244

Probing for Artifacts: Detecting Imagenet Model Evasions

Citation Formats

Similar Records

Related Subjects