skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: VisIRR: A Visual Analytics System for Information Retrieval and Recommendation for Large-Scale Document Data

Abstract

In this paper, we present an interactive visual information retrieval and recommendation system, called VisIRR, for large-scale document discovery. VisIRR effectively combines the paradigms of (1) a passive pull through query processes for retrieval and (2) an active push that recommends items of potential interest to users based on their preferences. Equipped with an efficient dynamic query interface against a large-scale corpus, VisIRR organizes the retrieved documents into high-level topics and visualizes them in a 2D space, representing the relationships among the topics along with their keyword summary. In addition, based on interactive personalized preference feedback with regard to documents, VisIRR provides document recommendations from the entire corpus, which are beyond the retrieved sets. Such recommended documents are visualized in the same space as the retrieved documents, so that users can seamlessly analyze both existing and newly recommended ones. This article presents novel computational methods, which make these integrated representations and fast interactions possible for a large-scale document corpus. We illustrate how the system works by providing detailed usage scenarios. Finally, we present preliminary user study results for evaluating the effectiveness of the system.

Authors:
 [1];  [2];  [2];  [3];  [4];  [5];  [4]; ORCiD logo [6];  [7];  [2];  [2]
  1. Korea University, Seoul (South Korea)
  2. Georgia Inst. of Technology, Atlanta, GA (United States)
  3. Adobe Research, Seattle, WA (United States)
  4. Google Inc., Mountain View, CA (United States)
  5. Oregon State University, Corvallis, OR (United States)
  6. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  7. Southwestern University, Georgetown, TX (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1426558
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
ACM Transactions on Knowledge Discovery from Data
Additional Journal Information:
Journal Volume: 12; Journal Issue: 1; Journal ID: ISSN 1556-4681
Publisher:
Association for Computing Machinery
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; 96 KNOWLEDGE MANAGEMENT AND PRESERVATION

Citation Formats

Choo, Jaegul, Kim, Hannah, Clarkson, Edward, Liu, Zhicheng, Lee, Changhyun, Li, Fuxin, Lee, Hanseung, Kannan, Ramakrishnan, Stolper, Charles D., Stasko, John, and Park, Haesun. VisIRR: A Visual Analytics System for Information Retrieval and Recommendation for Large-Scale Document Data. United States: N. p., 2018. Web. doi:10.1145/3070616.
Choo, Jaegul, Kim, Hannah, Clarkson, Edward, Liu, Zhicheng, Lee, Changhyun, Li, Fuxin, Lee, Hanseung, Kannan, Ramakrishnan, Stolper, Charles D., Stasko, John, & Park, Haesun. VisIRR: A Visual Analytics System for Information Retrieval and Recommendation for Large-Scale Document Data. United States. https://doi.org/10.1145/3070616
Choo, Jaegul, Kim, Hannah, Clarkson, Edward, Liu, Zhicheng, Lee, Changhyun, Li, Fuxin, Lee, Hanseung, Kannan, Ramakrishnan, Stolper, Charles D., Stasko, John, and Park, Haesun. 2018. "VisIRR: A Visual Analytics System for Information Retrieval and Recommendation for Large-Scale Document Data". United States. https://doi.org/10.1145/3070616. https://www.osti.gov/servlets/purl/1426558.
@article{osti_1426558,
title = {VisIRR: A Visual Analytics System for Information Retrieval and Recommendation for Large-Scale Document Data},
author = {Choo, Jaegul and Kim, Hannah and Clarkson, Edward and Liu, Zhicheng and Lee, Changhyun and Li, Fuxin and Lee, Hanseung and Kannan, Ramakrishnan and Stolper, Charles D. and Stasko, John and Park, Haesun},
abstractNote = {In this paper, we present an interactive visual information retrieval and recommendation system, called VisIRR, for large-scale document discovery. VisIRR effectively combines the paradigms of (1) a passive pull through query processes for retrieval and (2) an active push that recommends items of potential interest to users based on their preferences. Equipped with an efficient dynamic query interface against a large-scale corpus, VisIRR organizes the retrieved documents into high-level topics and visualizes them in a 2D space, representing the relationships among the topics along with their keyword summary. In addition, based on interactive personalized preference feedback with regard to documents, VisIRR provides document recommendations from the entire corpus, which are beyond the retrieved sets. Such recommended documents are visualized in the same space as the retrieved documents, so that users can seamlessly analyze both existing and newly recommended ones. This article presents novel computational methods, which make these integrated representations and fast interactions possible for a large-scale document corpus. We illustrate how the system works by providing detailed usage scenarios. Finally, we present preliminary user study results for evaluating the effectiveness of the system.},
doi = {10.1145/3070616},
url = {https://www.osti.gov/biblio/1426558}, journal = {ACM Transactions on Knowledge Discovery from Data},
issn = {1556-4681},
number = 1,
volume = 12,
place = {United States},
year = {2018},
month = {1}
}

Works referenced in this record:

The procrustes program: Producing direct rotation to test a hypothesized factor structure
journal, April 1962


Promoting Insight-Based Evaluation of Visualizations: From Contest to Benchmark Repository
journal, January 2008


Two-stage framework for visualization of clustered high dimensional data
conference, October 2009


Document clustering using word clusters via the information bottleneck method
conference, January 2000

  • Slonim, Noam; Tishby, Naftali
  • Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '00
  • https://doi.org/10.1145/345508.345578

Generalizing discriminant analysis using the generalized singular value decomposition
journal, August 2004


Fast Nonnegative Matrix Factorization: An Active-Set-Like Method and Comparisons
journal, January 2011


Rapid understanding of scientific paper collections: Integrating statistics, text analytics, and visualization
journal, November 2012

  • Dunne, Cody; Shneiderman, Ben; Gove, Robert
  • Journal of the American Society for Information Science and Technology, Vol. 63, Issue 12
  • https://doi.org/10.1002/asi.22652

Beyond keyword search: discovering relevant scientific literature
conference, January 2011


iVisClustering: An Interactive Visual Document Clustering via Topic Modeling
journal, June 2012


Semantic interaction for visual text analytics
conference, January 2012


A biterm topic model for short texts
conference, January 2013


The heat kernel as the pagerank of a graph
journal, December 2007


Collaborative topic modeling for recommending scientific articles
conference, January 2011


An interactive visual testbed system for dimension reduction and clustering of large-scale high-dimensional data
conference, February 2013


Probabilistic latent semantic indexing
conference, January 1999

  • Hofmann, Thomas
  • Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval , p. 50-57
  • https://doi.org/10.1145/312624.312649

Document clustering based on non-negative matrix factorization
conference, January 2003

  • Xu, Wei; Liu, Xin; Gong, Yihong
  • Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval - SIGIR '03
  • https://doi.org/10.1145/860435.860485

Finding facts vs. browsing knowledge in hypertext systems
journal, January 1988


An Insight-Based Methodology for Evaluating Bioinformatics Visualizations
journal, July 2005


UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization
journal, December 2013


Computational models of information scent-following in a very large browsable text collection
conference, January 1997


IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use
journal, January 1995


iVisClassifier: An interactive visual analytics system for classification based on supervised dimension reduction
conference, October 2010


The challenge of information visualization evaluation
conference, January 2004


ArnetMiner: extraction and mining of academic social networks
conference, January 2008


Information foraging.
journal, October 1999


Representing documents through their readers
conference, January 2013


Visualizing the non-visual: spatial analysis and interaction with information from text documents
conference, January 1995


Apolo: making sense of large network data by combining rich user interaction and machine learning
conference, January 2011


A Procrustes problem on the Stiefel manifold
journal, June 1999


The Hungarian method for the assignment problem
journal, March 1955


Fast algorithm for detecting community structure in networks
journal, June 2004


Works referencing / citing this record:

PaperPoles: Facilitating adaptive visual exploration of scientific publications by citation links
journal, February 2019