Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

XSub: Explanation-Driven Adversarial Attack against Blackbox Classifiers via Feature Substitution

Conference ·
Despite its significant benefits in enhancing the transparency and trustworthiness of artificial intelligence (AI) systems, explainable AI (XAI) can unintentionally provide adversaries with insights into blackbox models, increasing their vulnerability to various attacks. In this paper, we develop a novel explanation-driven adversarial attack against blackbox classifiers based on feature substitution, called XSub. The key idea of XSub is to strategically replace important features (identified via XAI) in the original sample with corresponding important features of a different label, thereby increasing the likelihood of the model misclassifying the perturbed sample. XSub only requires a minimal number of queries and can be easily extended to launch backdoor attacks in case the attacker has access to the model's training data. Our evaluation shows that XSub is not only effective and stealthy but also low-cost, showcasing its feasibility across a wide range of AI applications.
Research Organization:
National Renewable Energy Laboratory (NREL), Golden, CO (United States)
Sponsoring Organization:
USDOE National Renewable Energy Laboratory (NREL), Laboratory Directed Research and Development (LDRD) Program
DOE Contract Number:
AC36-08GO28308; AC36-08GO28308
OSTI ID:
2529417
Report Number(s):
NREL/CP-2C00-91278; MainId:93056; UUID:aa0173b0-4d16-4bbc-82a4-74752605ee4a; MainAdminId:75411
Country of Publication:
United States
Language:
English

References (18)

Circular Arc Length-Based Kernel Matrix For Protein Sequence Classification conference December 2023
Explainability-based adversarial attack on graphs through edge perturbation journal February 2025
XRand: Differentially Private Defense against Explanation-Guided Attacks journal June 2023
NeuCEPT: Learn Neural Networks’ Mechanism via Critical Neurons with Precision Guarantee conference November 2022
Towards Evaluating the Robustness of Neural Networks conference May 2017
HopSkipJumpAttack: A Query-Efficient Decision-Based Attack conference May 2020
“Why Should I Trust You?”: Explaining the Predictions of Any Classifier
  • Ribeiro, Marco; Singh, Sameer; Guestrin, Carlos
  • Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations https://doi.org/10.18653/v1/N16-3020
conference January 2016
Model Reconstruction from Model Explanations conference January 2019
The "Beatrix" Resurrections: Robust Backdoor Detection via Gram Matrices conference January 2023
Explanation-Guided Adversarial Example Attacks journal May 2024
Exploiting Explanations for Model Inversion Attacks conference October 2021
Boosting Adversarial Attacks with Momentum conference June 2018
c-Eval: A Unified Metric to Evaluate Feature-based Explanations via Perturbation conference December 2021
From local explanations to global understanding with explainable AI for trees journal January 2020
Robust Fraud Detection via Supervised Contrastive Learning conference December 2023
Explanation leaks: Explanation-guided model extraction attacks journal June 2023
On the Privacy Risks of Model Explanations conference July 2021
EG-Booster: Explanation-Guided Booster of ML Evasion Attacks conference April 2022

Similar Records

Attack on Grid Event Cause Analysis: An Adversarial Machine Learning Approach
Conference · Fri Jan 31 23:00:00 EST 2020 · 2020 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT) · OSTI ID:1958805

Defending Against Adversarial Examples
Technical Report · Sun Sep 01 00:00:00 EDT 2019 · OSTI ID:1569514

Sign-OPT: A Query-Efficient Hard-label Adversarial Attack
Conference · Sun Apr 26 00:00:00 EDT 2020 · OSTI ID:1958845