Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Active Learning for Language Modeling

Technical Report ·
DOI:https://doi.org/10.2172/1890039· OSTI ID:1890039
 [1];  [1];  [1]
  1. Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA (United States)
Foreign disinformation campaigns undermine national security. Various supervised language modeling techniques in NLP can help to understand and dismantle these campaigns, but they rely heavily on large, labeled (often by humans) datasets. This work provides a solution to this problem in the form of an active learning (AL) framework, which is used to generate labeled datasets and leverage human input for detecting disinformation. The developed AL framework utilizes task adaptive pretraining to fully leverage the unlabeled data and boost the performance of the classifier used for labeling. A disinformation rhetoric metric was developed to measure the presence of common rhetorical techniques used in text that are meant to deceive, for both the classifier and human to use in the task of identifying disinformation. This metric was combined with an uncertainty criterion to create a hybrid acquisition method for AL, and this hybrid method was tested alongside other acquisition functions. A sophisticated and robust stopping strategy was developed to signal the AL process should terminate, saving human time from being wasted on iterations that would not significantly benefit classifier performance.
Research Organization:
Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
DOE Contract Number:
NA0003525
OSTI ID:
1890039
Report Number(s):
SAND2022-13312; 710260
Country of Publication:
United States
Language:
English

Similar Records

KEBLM: Knowledge-Enhanced Biomedical Language Models
Journal Article · Thu May 18 20:00:00 EDT 2023 · Journal of Biomedical Informatics · OSTI ID:2420838

Identifying Disinformation Using Rhetorical Devices in Natural Language Models
Technical Report · Thu Sep 01 00:00:00 EDT 2022 · OSTI ID:1891194

Related Subjects