Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Persistent Classification: Understanding Adversarial Attacks by Studying Decision Boundary Dynamics

Journal Article · · Statistical Analysis and Data Mining
DOI:https://doi.org/10.1002/sam.11716· OSTI ID:2504244
 [1];  [2];  [3];  [4];  [3];  [5];  [6]
  1. Univ. of Arizona, Tucson, AZ (United States); Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
  2. Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); Univ. of Texas at San Antonio, TX (United States)
  3. Univ. of Arizona, Tucson, AZ (United States)
  4. Univ. of Texas, Arlington, TX (United States)
  5. Univ. of Texas at San Antonio, TX (United States)
  6. Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
There are a number of hypotheses underlying the existence of adversarial examples for classification problems. These include the high-dimensionality of the data, the high codimension in the ambient space of the data manifolds of interest, and that the structure of machine learning models may encourage classifiers to develop decision boundaries close to data points. This article proposes a new framework for studying adversarial examples that does not depend directly on the distance to the decision boundary. Similarly to the smoothed classifier literature, we define a (natural or adversarial) data point to be (γ, σ)-stable if the probability of the same classification is at least γ for points sampled in a Gaussian neighborhood of the point with a given standard deviation σ. We focus on studying the differences between persistence metrics along interpolants of natural and adversarial points. We show that adversarial examples have significantly lower persistence than natural examples for large neural networks in the context of the MNIST and ImageNet datasets. We connect this lack of persistence with decision boundary geometry by measuring angles of interpolants with respect to decision boundaries. Finally, we connect this approach with robustness by developing a manifold alignment gradient metric and demonstrating the increase in robustness that can be achieved when training with the addition of this metric.
Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE; USDOE Laboratory Directed Research and Development (LDRD) Program
Grant/Contract Number:
89233218CNA000001
OSTI ID:
2504244
Alternate ID(s):
OSTI ID: 2530499
OSTI ID: 2530371
Report Number(s):
LA-UR--23-26395
Journal Information:
Statistical Analysis and Data Mining, Journal Name: Statistical Analysis and Data Mining Journal Issue: 1 Vol. 18; ISSN 1932-1864
Publisher:
WileyCopyright Statement
Country of Publication:
United States
Language:
English

References (11)

The (Un)reliability of Saliency Methods book January 2019
On the limited memory BFGS method for large scale optimization journal August 1989
ROBY: Evaluating the adversarial robustness of a deep model by its decision boundaries journal March 2022
Deep learning in neural networks: An overview journal January 2015
ImageNet: A large-scale hierarchical image database
  • Deng, Jia; Dong, Wei; Socher, Richard
  • 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), 2009 IEEE Conference on Computer Vision and Pattern Recognition https://doi.org/10.1109/CVPR.2009.5206848
conference June 2009
Empirical Study of the Topology and Geometry of Deep Networks conference June 2018
Deflecting Adversarial Attacks with Pixel Deflection conference June 2018
Boosting Adversarial Attacks with Momentum conference June 2018
Certified Robustness to Adversarial Examples with Differential Privacy conference May 2019
Decision Boundary-aware Data Augmentation for Adversarial Training journal January 2022
Mr2DNM: A Novel Mutual Information-Based Dendritic Neuron Model journal August 2019