Evaluating Deception Detection Model Robustness To Linguistic Variation
- BATTELLE (PACIFIC NW LAB)
With the increasing use of automated, machine learning-driven tools and the downstream impact that algorithmic judgements can have, it is critical to develop models that are robust to evolving or manipulated inputs. Evaluating the reliability of multimodal models across linguistic variations to understand model susceptibility to intentional linguistic adversarial attacks as well as natural linguistic variations is essential in this pursuit. We present extensive analysis of model robustness and susceptibility to linguistic variations in the setting of deceptive news detection, a difficult classification task that is an increasingly important problem to solve with the impact of misinformation spread online. We evaluate the effectiveness of incorporating adversarial defense strategies and measure model susceptibility to state-of-the-art adversarial attacks using two types of linguistic attacks — character and word perturbations. We consider two multiclass prediction tasks — a 3-way classification of tweets as trustworthy, propaganda, or disinformation; and a 4-way classification as clickbait, hoax, satire, or conspiracy — and compare the performance of three embeddings that have been state-of-the-art for several NLP tasks — GloVe, ELMo, and BERT — to highlight consistent trends in susceptibility, high confidence misclassifications, and high impact failures. We find that character or mixed ensemble models are the most effective defense mechanisms and that character perturbations are a more effective attack than word perturbations for deception classification.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1894779
- Report Number(s):
- PNNL-SA-156922
- Country of Publication:
- United States
- Language:
- English
Similar Records
Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on Twitter
Misleading or Falsification? Inferring Deceptive Strategies and Types in Online News and Social Media