Evaluating Deception Detection Model Robustness To Linguistic Variation

Glenski, Maria F.; Ayton, Ellyn M.; Cosbey, Robin J.; Arendt, Dustin L.; Volkova, Svitlana

doi:10.18653/v1/2021.socialnlp-1.6

Evaluating Deception Detection Model Robustness To Linguistic Variation

Conference · Thu Jun 10 04:00:00 EDT 2021

DOI:https://doi.org/10.18653/v1/2021.socialnlp-1.6· OSTI ID:1894779

Glenski, Maria F. ^[1]; Ayton, Ellyn M. ^[1]; Cosbey, Robin J. ^[1]; Arendt, Dustin L. ^[1]; Volkova, Svitlana ^[1]

BATTELLE (PACIFIC NW LAB)

With the increasing use of automated, machine learning-driven tools and the downstream impact that algorithmic judgements can have, it is critical to develop models that are robust to evolving or manipulated inputs. Evaluating the reliability of multimodal models across linguistic variations to understand model susceptibility to intentional linguistic adversarial attacks as well as natural linguistic variations is essential in this pursuit. We present extensive analysis of model robustness and susceptibility to linguistic variations in the setting of deceptive news detection, a difficult classification task that is an increasingly important problem to solve with the impact of misinformation spread online. We evaluate the effectiveness of incorporating adversarial defense strategies and measure model susceptibility to state-of-the-art adversarial attacks using two types of linguistic attacks — character and word perturbations. We consider two multiclass prediction tasks — a 3-way classification of tweets as trustworthy, propaganda, or disinformation; and a 4-way classification as clickbait, hoax, satire, or conspiracy — and compare the performance of three embeddings that have been state-of-the-art for several NLP tasks — GloVe, ELMo, and BERT — to highlight consistent trends in susceptibility, high confidence misclassifications, and high impact failures. We find that character or mixed ensemble models are the most effective defense mechanisms and that character perturbations are a more effective attack than word perturbations for deception classification.

🛈

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Research Organization:: Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-76RL01830

OSTI ID:: 1894779

Report Number(s):: PNNL-SA-156922

Country of Publication:: United States

Language:: English

Similar Records

Explaining Multimodal Deceptive News Prediction Models

Conference · Sat Jul 06 00:00:00 EDT 2019 · OSTI ID:1532355

Misleading or Falsification? Inferring Deceptive Strategies and Types in Online News and Social Media

Conference · Fri Apr 27 00:00:00 EDT 2018 · OSTI ID:1435892

Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on Twitter

Conference · Sun Jul 30 00:00:00 EDT 2017 · OSTI ID:1373869

Related Subjects

ML T&E
adversarial evaluation
artificial intelligence
deception detection
deep learning
deep learning (DL)
disinformation
explainable AI
machine learning (ML)
model evaluation
natural language processing
robustness
• Artificial intelligence (AI) / machine learning (ML)

Evaluating Deception Detection Model Robustness To Linguistic Variation

Citation Formats

Similar Records

Related Subjects