skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Visual Exploration of Semantic Relationships in Neural Word Embeddings

Abstract

Constructing distributed representations for words through neural language models and using the resulting vector spaces for analysis has become a crucial component of natural language processing (NLP). But, despite their widespread application, little is known about the structure and properties of these spaces. To gain insights into the relationship between words, the NLP community has begun to adapt high-dimensional visualization techniques. Particularly, researchers commonly use t-distributed stochastic neighbor embeddings (t-SNE) and principal component analysis (PCA) to create two-dimensional embeddings for assessing the overall structure and exploring linear relationships (e.g., word analogies), respectively. Unfortunately, these techniques often produce mediocre or even misleading results and cannot address domain-specific visualization challenges that are crucial for understanding semantic relationships in word embeddings. We introduce new embedding techniques for visualizing semantic and syntactic analogies, and the corresponding tests to determine whether the resulting views capture salient structures. Additionally, we introduce two novel views for a comprehensive study of analogy relationships. Finally, we augment t-SNE embeddings to convey uncertainty information in order to allow a reliable interpretation. Combined, the different views address a number of domain-specific tasks difficult to solve with existing tools.

Authors:
 [1];  [1];  [1];  [2];  [3];  [3];  [3]
  1. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
  2. Univ. of Utah, Salt Lake City, UT (United States). School of Computing
  3. Univ. of Utah, Salt Lake City, UT (United States). SCI Inst.
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA); National Science Foundation (NSF)
OSTI Identifier:
1416496
Report Number(s):
LLNL-JRNL-741817
Journal ID: ISSN 1077-2626
Grant/Contract Number:  
AC52-07NA27344; SC0007446; NA0002375; SC0010498
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
IEEE Transactions on Visualization and Computer Graphics
Additional Journal Information:
Journal Volume: 24; Journal Issue: 1; Journal ID: ISSN 1077-2626
Publisher:
IEEE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE

Citation Formats

Liu, Shusen, Bremer, Peer-Timo, Thiagarajan, Jayaraman J., Srikumar, Vivek, Wang, Bei, Livnat, Yarden, and Pascucci, Valerio. Visual Exploration of Semantic Relationships in Neural Word Embeddings. United States: N. p., 2017. Web. doi:10.1109/TVCG.2017.2745141.
Liu, Shusen, Bremer, Peer-Timo, Thiagarajan, Jayaraman J., Srikumar, Vivek, Wang, Bei, Livnat, Yarden, & Pascucci, Valerio. Visual Exploration of Semantic Relationships in Neural Word Embeddings. United States. doi:10.1109/TVCG.2017.2745141.
Liu, Shusen, Bremer, Peer-Timo, Thiagarajan, Jayaraman J., Srikumar, Vivek, Wang, Bei, Livnat, Yarden, and Pascucci, Valerio. Tue . "Visual Exploration of Semantic Relationships in Neural Word Embeddings". United States. doi:10.1109/TVCG.2017.2745141. https://www.osti.gov/servlets/purl/1416496.
@article{osti_1416496,
title = {Visual Exploration of Semantic Relationships in Neural Word Embeddings},
author = {Liu, Shusen and Bremer, Peer-Timo and Thiagarajan, Jayaraman J. and Srikumar, Vivek and Wang, Bei and Livnat, Yarden and Pascucci, Valerio},
abstractNote = {Constructing distributed representations for words through neural language models and using the resulting vector spaces for analysis has become a crucial component of natural language processing (NLP). But, despite their widespread application, little is known about the structure and properties of these spaces. To gain insights into the relationship between words, the NLP community has begun to adapt high-dimensional visualization techniques. Particularly, researchers commonly use t-distributed stochastic neighbor embeddings (t-SNE) and principal component analysis (PCA) to create two-dimensional embeddings for assessing the overall structure and exploring linear relationships (e.g., word analogies), respectively. Unfortunately, these techniques often produce mediocre or even misleading results and cannot address domain-specific visualization challenges that are crucial for understanding semantic relationships in word embeddings. We introduce new embedding techniques for visualizing semantic and syntactic analogies, and the corresponding tests to determine whether the resulting views capture salient structures. Additionally, we introduce two novel views for a comprehensive study of analogy relationships. Finally, we augment t-SNE embeddings to convey uncertainty information in order to allow a reliable interpretation. Combined, the different views address a number of domain-specific tasks difficult to solve with existing tools.},
doi = {10.1109/TVCG.2017.2745141},
journal = {IEEE Transactions on Visualization and Computer Graphics},
number = 1,
volume = 24,
place = {United States},
year = {Tue Aug 29 00:00:00 EDT 2017},
month = {Tue Aug 29 00:00:00 EDT 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 2 works
Citation information provided by
Web of Science

Save / Share: