XSub: Explanation-Driven Adversarial Attack against Blackbox Classifiers via Feature Substitution
Despite its significant benefits in enhancing the transparency and trustworthiness of artificial intelligence (AI) systems, explainable AI (XAI) can unintentionally provide adversaries with insights into blackbox models, increasing their vulnerability to various attacks. In this paper, we develop a novel explanation-driven adversarial attack against blackbox classifiers based on feature substitution, called XSub. The key idea of XSub is to strategically replace important features (identified via XAI) in the original sample with corresponding important features of a different label, thereby increasing the likelihood of the model misclassifying the perturbed sample. XSub only requires a minimal number of queries and can be easily extended to launch backdoor attacks in case the attacker has access to the model's training data. Our evaluation shows that XSub is not only effective and stealthy but also low-cost, showcasing its feasibility across a wide range of AI applications.
- Research Organization:
- National Renewable Energy Laboratory (NREL), Golden, CO (United States)
- Sponsoring Organization:
- USDOE National Renewable Energy Laboratory (NREL), Laboratory Directed Research and Development (LDRD) Program
- DOE Contract Number:
- AC36-08GO28308; AC36-08GO28308
- OSTI ID:
- 2529417
- Report Number(s):
- NREL/CP-2C00-91278; MainId:93056; UUID:aa0173b0-4d16-4bbc-82a4-74752605ee4a; MainAdminId:75411
- Country of Publication:
- United States
- Language:
- English
Similar Records
Attack on Grid Event Cause Analysis: An Adversarial Machine Learning Approach
Defending Against Adversarial Examples
Sign-OPT: A Query-Efficient Hard-label Adversarial Attack
Conference
·
Fri Jan 31 23:00:00 EST 2020
· 2020 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT)
·
OSTI ID:1958805
Defending Against Adversarial Examples
Technical Report
·
Sun Sep 01 00:00:00 EDT 2019
·
OSTI ID:1569514
Sign-OPT: A Query-Efficient Hard-label Adversarial Attack
Conference
·
Sun Apr 26 00:00:00 EDT 2020
·
OSTI ID:1958845