Identification of threats using linguistics-based knowledge extraction.

Chew, Peter A

doi:10.2172/940522

Title: Identification of threats using linguistics-based knowledge extraction.

Technical Report · Mon Sep 01 00:00:00 EDT 2008

DOI:https://doi.org/10.2172/940522· OSTI ID:940522

Chew, Peter A

One of the challenges increasingly facing intelligence analysts, along with professionals in many other fields, is the vast amount of data which needs to be reviewed and converted into meaningful information, and ultimately into rational, wise decisions by policy makers. The advent of the world wide web (WWW) has magnified this challenge. A key hypothesis which has guided us is that threats come from ideas (or ideology), and ideas are almost always put into writing before the threats materialize. While in the past the 'writing' might have taken the form of pamphlets or books, today's medium of choice is the WWW, precisely because it is a decentralized, flexible, and low-cost method of reaching a wide audience. However, a factor which complicates matters for the analyst is that material published on the WWW may be in any of a large number of languages. In 'Identification of Threats Using Linguistics-Based Knowledge Extraction', we have sought to use Latent Semantic Analysis (LSA) and other similar text analysis techniques to map documents from the WWW, in whatever language they were originally written, to a common language-independent vector-based representation. This then opens up a number of possibilities. First, similar documents can be found across language boundaries. Secondly, a set of documents in multiple languages can be visualized in a graphical representation. These alone offer potentially useful tools and capabilities to the intelligence analyst whose knowledge of foreign languages may be limited. Finally, we can test the over-arching hypothesis--that ideology, and more specifically ideology which represents a threat, can be detected solely from the words which express the ideology--by using the vector-based representation of documents to predict additional features (such as the ideology) within a framework based on supervised learning. In this report, we present the results of a three-year project of the same name. We believe these results clearly demonstrate the general feasibility of an approach such as that outlined above. Nevertheless, there are obstacles which must still be overcome, relating primarily to how 'ideology' should be defined. We discuss these and point to possible solutions.

View Technical Report

Cite

Export

Save

Research Organization:: Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC04-94AL85000

OSTI ID:: 940522

Report Number(s):: SAND2008-6104; TRN: US200824%%190

Country of Publication:: United States

Language:: English

Similar Records

Algorithms and architectures for high performance analysis of semantic graphs.

Technical Report · Thu Sep 01 00:00:00 EDT 2005 · OSTI ID:940522

Hendrickson, Bruce Alan

Attack Surface Analysis of the Digital Twins interface with Advanced Sensor and Instrumentation Interfaces: Cyber Threat Assessment and Attack Demonstration for Digital Twins in Advanced Reactor Architectures - M3CT-23IN1105033

Technical Report · Sun Oct 01 00:00:00 EDT 2023 · OSTI ID:940522

Spirito, Christopher M; Ly, Jeff; Veretennikov, Michael

A Scalable HPC Insider Threat Monitoring System

Technical Report · Sat Mar 10 00:00:00 EST 2018 · OSTI ID:940522

Shevenell, Michael John

Related Subjects

99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
HYPOTHESIS
LEARNING
SABOTAGE
DETECTION
FEASIBILITY STUDIES
STANDARDIZED TERMINOLOGY
INFORMATION RETRIEVAL
Military intelligence.
Computational linguistics.
Semantics.
Computational intelligence.
Linguistics
Applied linguistics

Title: Identification of threats using linguistics-based knowledge extraction.

Citation Formats

Similar Records

Related Subjects