A Compound Data Poisoning Technique with Significant Adversarial Effects on Transformer-based Sentiment Classification Tasks
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Transformer-based models have demonstrated much success in various natural language processing tasks. However, they are often vulnerable to adversarial attacks, such as data poisoning, which can intentionally fool the model into generating incorrect results. In this article, we present a novel, compound variant of a data poisoning attack on a transformer-based model that maximizes the poisoning effect while minimizing the scope of poisoning. Here we do so by combining the established data poisoning technique (label flipping) with a novel adversarial artifact selection and insertion technique aimed at minimizing detectability and the scope of the poisoning footprint. We find that by using a combination of these two techniques, we achieve a state-of-the-art attack success rate of approximately 90% while poisoning only 0.5% of the original training set, thus minimizing the scope and detectability of the poisoning action. These findings have the potential to advance the development of better data poisoning detection methods.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE
- Grant/Contract Number:
- AC05-00OR22725
- OSTI ID:
- 2480059
- Journal Information:
- ACM journal of data and information quality, Journal Name: ACM journal of data and information quality Journal Issue: 4 Vol. 16; ISSN 1936-1963
- Publisher:
- Association for Computing MachineryCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Semantic Stealth: Crafting Covert Adversarial Patches for Sentiment Classifiers Using Large Language Models
Effects of Jacobian Matrix Regularization on the Detectability of Adversarial Samples