VisIRR: A Visual Analytics System for Information Retrieval and Recommendation for Large-Scale Document Data
Abstract
In this paper, we present an interactive visual information retrieval and recommendation system, called VisIRR, for large-scale document discovery. VisIRR effectively combines the paradigms of (1) a passive pull through query processes for retrieval and (2) an active push that recommends items of potential interest to users based on their preferences. Equipped with an efficient dynamic query interface against a large-scale corpus, VisIRR organizes the retrieved documents into high-level topics and visualizes them in a 2D space, representing the relationships among the topics along with their keyword summary. In addition, based on interactive personalized preference feedback with regard to documents, VisIRR provides document recommendations from the entire corpus, which are beyond the retrieved sets. Such recommended documents are visualized in the same space as the retrieved documents, so that users can seamlessly analyze both existing and newly recommended ones. This article presents novel computational methods, which make these integrated representations and fast interactions possible for a large-scale document corpus. We illustrate how the system works by providing detailed usage scenarios. Finally, we present preliminary user study results for evaluating the effectiveness of the system.
- Authors:
-
- Korea University, Seoul (South Korea)
- Georgia Inst. of Technology, Atlanta, GA (United States)
- Adobe Research, Seattle, WA (United States)
- Google Inc., Mountain View, CA (United States)
- Oregon State University, Corvallis, OR (United States)
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Southwestern University, Georgetown, TX (United States)
- Publication Date:
- Research Org.:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1426558
- Grant/Contract Number:
- AC05-00OR22725
- Resource Type:
- Journal Article: Accepted Manuscript
- Journal Name:
- ACM Transactions on Knowledge Discovery from Data
- Additional Journal Information:
- Journal Volume: 12; Journal Issue: 1; Journal ID: ISSN 1556-4681
- Publisher:
- Association for Computing Machinery
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; 96 KNOWLEDGE MANAGEMENT AND PRESERVATION
Citation Formats
Choo, Jaegul, Kim, Hannah, Clarkson, Edward, Liu, Zhicheng, Lee, Changhyun, Li, Fuxin, Lee, Hanseung, Kannan, Ramakrishnan, Stolper, Charles D., Stasko, John, and Park, Haesun. VisIRR: A Visual Analytics System for Information Retrieval and Recommendation for Large-Scale Document Data. United States: N. p., 2018.
Web. doi:10.1145/3070616.
Choo, Jaegul, Kim, Hannah, Clarkson, Edward, Liu, Zhicheng, Lee, Changhyun, Li, Fuxin, Lee, Hanseung, Kannan, Ramakrishnan, Stolper, Charles D., Stasko, John, & Park, Haesun. VisIRR: A Visual Analytics System for Information Retrieval and Recommendation for Large-Scale Document Data. United States. https://doi.org/10.1145/3070616
Choo, Jaegul, Kim, Hannah, Clarkson, Edward, Liu, Zhicheng, Lee, Changhyun, Li, Fuxin, Lee, Hanseung, Kannan, Ramakrishnan, Stolper, Charles D., Stasko, John, and Park, Haesun. 2018.
"VisIRR: A Visual Analytics System for Information Retrieval and Recommendation for Large-Scale Document Data". United States. https://doi.org/10.1145/3070616. https://www.osti.gov/servlets/purl/1426558.
@article{osti_1426558,
title = {VisIRR: A Visual Analytics System for Information Retrieval and Recommendation for Large-Scale Document Data},
author = {Choo, Jaegul and Kim, Hannah and Clarkson, Edward and Liu, Zhicheng and Lee, Changhyun and Li, Fuxin and Lee, Hanseung and Kannan, Ramakrishnan and Stolper, Charles D. and Stasko, John and Park, Haesun},
abstractNote = {In this paper, we present an interactive visual information retrieval and recommendation system, called VisIRR, for large-scale document discovery. VisIRR effectively combines the paradigms of (1) a passive pull through query processes for retrieval and (2) an active push that recommends items of potential interest to users based on their preferences. Equipped with an efficient dynamic query interface against a large-scale corpus, VisIRR organizes the retrieved documents into high-level topics and visualizes them in a 2D space, representing the relationships among the topics along with their keyword summary. In addition, based on interactive personalized preference feedback with regard to documents, VisIRR provides document recommendations from the entire corpus, which are beyond the retrieved sets. Such recommended documents are visualized in the same space as the retrieved documents, so that users can seamlessly analyze both existing and newly recommended ones. This article presents novel computational methods, which make these integrated representations and fast interactions possible for a large-scale document corpus. We illustrate how the system works by providing detailed usage scenarios. Finally, we present preliminary user study results for evaluating the effectiveness of the system.},
doi = {10.1145/3070616},
url = {https://www.osti.gov/biblio/1426558},
journal = {ACM Transactions on Knowledge Discovery from Data},
issn = {1556-4681},
number = 1,
volume = 12,
place = {United States},
year = {2018},
month = {1}
}
Web of Science
Works referenced in this record:
The procrustes program: Producing direct rotation to test a hypothesized factor structure
journal, April 1962
- Hurley, John R.; Cattell, Raymond B.
- Behavioral Science, Vol. 7, Issue 2
Promoting Insight-Based Evaluation of Visualizations: From Contest to Benchmark Repository
journal, January 2008
- Plaisant, C.; Fekete, J. -D.; Grinstein, G.
- IEEE Transactions on Visualization and Computer Graphics, Vol. 14, Issue 1
Two-stage framework for visualization of clustered high dimensional data
conference, October 2009
- Choo, Jaegul; Bohn, Shawn; Park, Haesun
- 2009 IEEE Symposium on Visual Analytics Science and Technology
Document clustering using word clusters via the information bottleneck method
conference, January 2000
- Slonim, Noam; Tishby, Naftali
- Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '00
Generalizing discriminant analysis using the generalized singular value decomposition
journal, August 2004
- Howland, P.; Park, H.
- IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 26, Issue 8
Fast Nonnegative Matrix Factorization: An Active-Set-Like Method and Comparisons
journal, January 2011
- Kim, Jingu; Park, Haesun
- SIAM Journal on Scientific Computing, Vol. 33, Issue 6
Rapid understanding of scientific paper collections: Integrating statistics, text analytics, and visualization
journal, November 2012
- Dunne, Cody; Shneiderman, Ben; Gove, Robert
- Journal of the American Society for Information Science and Technology, Vol. 63, Issue 12
Beyond keyword search: discovering relevant scientific literature
conference, January 2011
- El-Arini, Khalid; Guestrin, Carlos
- Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11
iVisClustering: An Interactive Visual Document Clustering via Topic Modeling
journal, June 2012
- Lee, Hanseung; Kihm, Jaeyeon; Choo, Jaegul
- Computer Graphics Forum, Vol. 31, Issue 3pt3
Semantic interaction for visual text analytics
conference, January 2012
- Endert, Alex; Fiaux, Patrick; North, Chris
- Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems - CHI '12
Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis
journal, May 2007
- Kim, H.; Park, H.
- Bioinformatics, Vol. 23, Issue 12
A biterm topic model for short texts
conference, January 2013
- Yan, Xiaohui; Guo, Jiafeng; Lan, Yanyan
- Proceedings of the 22nd international conference on World Wide Web - WWW '13
The heat kernel as the pagerank of a graph
journal, December 2007
- Chung, F.
- Proceedings of the National Academy of Sciences, Vol. 104, Issue 50
Collaborative topic modeling for recommending scientific articles
conference, January 2011
- Wang, Chong; Blei, David M.
- Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11
An interactive visual testbed system for dimension reduction and clustering of large-scale high-dimensional data
conference, February 2013
- Choo, Jaegul; Lee, Hanseung; Liu, Zhicheng
- IS&T/SPIE Electronic Imaging, SPIE Proceedings
Probabilistic latent semantic indexing
conference, January 1999
- Hofmann, Thomas
- Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval , p. 50-57
Document clustering based on non-negative matrix factorization
conference, January 2003
- Xu, Wei; Liu, Xin; Gong, Yihong
- Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval - SIGIR '03
Finding facts vs. browsing knowledge in hypertext systems
journal, January 1988
- Marchionini, G.; Shneiderman, B.
- Computer, Vol. 21, Issue 1
An Insight-Based Methodology for Evaluating Bioinformatics Visualizations
journal, July 2005
- Saraiya, P.; North, C.; Duca, K.
- IEEE Transactions on Visualization and Computer Graphics, Vol. 11, Issue 4
UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization
journal, December 2013
- Jaegul Choo, ; Reddy, Chandan K.
- IEEE Transactions on Visualization and Computer Graphics, Vol. 19, Issue 12
Computational models of information scent-following in a very large browsable text collection
conference, January 1997
- Pirolli, Peter
- Proceedings of the SIGCHI conference on Human factors in computing systems - CHI '97
IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use
journal, January 1995
- Lewis, James R.
- International Journal of Human-Computer Interaction, Vol. 7, Issue 1
iVisClassifier: An interactive visual analytics system for classification based on supervised dimension reduction
conference, October 2010
- Choo, Jaegul; Lee, Hanseung; Kihm, Jaeyeon
- 2010 IEEE Symposium on Visual Analytics Science and Technology (VAST)
The challenge of information visualization evaluation
conference, January 2004
- Plaisant, Catherine
- Proceedings of the working conference on Advanced visual interfaces - AVI '04
ArnetMiner: extraction and mining of academic social networks
conference, January 2008
- Tang, Jie; Zhang, Jing; Yao, Limin
- Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08
Information foraging.
journal, October 1999
- Pirolli, Peter; Card, Stuart
- Psychological Review, Vol. 106, Issue 4, p. 643-675
Representing documents through their readers
conference, January 2013
- El-Arini, Khalid; Xu, Min; Fox, Emily B.
- Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '13
Visualizing the non-visual: spatial analysis and interaction with information from text documents
conference, January 1995
- Wise, J. A.; Thomas, J. J.; Pennock, K.
- Proceedings of Visualization 1995 Conference
Apolo: making sense of large network data by combining rich user interaction and machine learning
conference, January 2011
- Chau, Duen Horng; Kittur, Aniket; Hong, Jason I.
- Proceedings of the 2011 annual conference on Human factors in computing systems - CHI '11
A Procrustes problem on the Stiefel manifold
journal, June 1999
- Eldén, Lars; Park, Haesun
- Numerische Mathematik, Vol. 82, Issue 4
The Hungarian method for the assignment problem
journal, March 1955
- Kuhn, H. W.
- Naval Research Logistics Quarterly, Vol. 2, Issue 1-2, p. 83-97
Fast algorithm for detecting community structure in networks
journal, June 2004
- Newman, M. E. J.
- Physical Review E, Vol. 69, Issue 6
Works referencing / citing this record:
PaperPoles: Facilitating adaptive visual exploration of scientific publications by citation links
journal, February 2019
- He, Jiangen; Ping, Qing; Lou, Wen
- Journal of the Association for Information Science and Technology, Vol. 70, Issue 8