skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Graph Sampling for Visual Analytics

Abstract

Effectively visualizing large graphs and capturing the statistical properties are two challenging tasks. To aid in these two tasks, many sampling approaches for graph simplification have been proposed, falling into three categories: node sampling, edge sampling, and traversal-based sampling. It is still unknown which approach is the best. We evaluate commonly used graph sampling methods through a combined visual and statistical comparison of graphs sampled at various rates. We conduct our evaluation on three graph models: random graphs, small-world graphs, and scale-free graphs. Initial results indicate that the effectiveness of a sampling method is dependent on the graph model, the size of the graph, and the desired statistical property. This benchmark study can be used as a guideline in choosing the appropriate method for a particular graph sampling task, and the results presented can be incorporated into graph visualization and analysis tools.

Authors:
; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1406791
Report Number(s):
PNNL-SA-122634
Journal ID: ISSN 1062-3701; 453040300
DOE Contract Number:
AC05-76RL01830
Resource Type:
Journal Article
Resource Relation:
Journal Name: Journal of Imaging Science and Technology; Journal Volume: 61; Journal Issue: 4
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Big graphs; Graph sampling; Graph properties; Graph drawing; Visualization

Citation Formats

Zhang, Fangyan, Zhang, Song, and Chung Wong, Pak. Graph Sampling for Visual Analytics. United States: N. p., 2017. Web. doi:10.2352/J.ImagingSci.Technol.2017.61.4.040503.
Zhang, Fangyan, Zhang, Song, & Chung Wong, Pak. Graph Sampling for Visual Analytics. United States. doi:10.2352/J.ImagingSci.Technol.2017.61.4.040503.
Zhang, Fangyan, Zhang, Song, and Chung Wong, Pak. Sat . "Graph Sampling for Visual Analytics". United States. doi:10.2352/J.ImagingSci.Technol.2017.61.4.040503.
@article{osti_1406791,
title = {Graph Sampling for Visual Analytics},
author = {Zhang, Fangyan and Zhang, Song and Chung Wong, Pak},
abstractNote = {Effectively visualizing large graphs and capturing the statistical properties are two challenging tasks. To aid in these two tasks, many sampling approaches for graph simplification have been proposed, falling into three categories: node sampling, edge sampling, and traversal-based sampling. It is still unknown which approach is the best. We evaluate commonly used graph sampling methods through a combined visual and statistical comparison of graphs sampled at various rates. We conduct our evaluation on three graph models: random graphs, small-world graphs, and scale-free graphs. Initial results indicate that the effectiveness of a sampling method is dependent on the graph model, the size of the graph, and the desired statistical property. This benchmark study can be used as a guideline in choosing the appropriate method for a particular graph sampling task, and the results presented can be incorporated into graph visualization and analysis tools.},
doi = {10.2352/J.ImagingSci.Technol.2017.61.4.040503},
journal = {Journal of Imaging Science and Technology},
number = 4,
volume = 61,
place = {United States},
year = {Sat Jul 01 00:00:00 EDT 2017},
month = {Sat Jul 01 00:00:00 EDT 2017}
}
  • We present a visual analytics technique to explore graphs using the concept of a data signature. A data signature, in our context, is a multidimensional vector that captures the local topology information surrounding each graph node. Signature vectors extracted from a graph are projected onto a low-dimensional scatterplot through the use of scaling. The resultant scatterplot, which reflects the similarities of the vectors, allows analysts to examine the graph structures and their corresponding real-life interpretations through repeated use of brushing and linking between the two visualizations. The interpretation of the graph structures is based on the outcomes of multiple participatorymore » analysis sessions with intelligence analysts conducted by the authors at the Pacific Northwest National Laboratory. The paper first uses three public domain datasets with either well-known or obvious features to explain the rationale of our design and illustrate its results. More advanced examples are then used in a customized usability study to evaluate the effectiveness and efficiency of our approach. The study results reveal not only the limitations and weaknesses of the traditional approach based solely on graph visualization but also the advantages and strengths of our signature-guided approach presented in the paper.« less
  • The evaluation of visual analytics environments was a topic in Illuminating the Path [Thomas 2005] as a critical aspect of moving research into practice. For a thorough understanding of the utility of the systems available, evaluation not only involves assessing the visualizations, interactions or data processing algorithms themselves, but also the complex processes that a tool is meant to support (such as exploratory data analysis and reasoning, communication through visualization, or collaborative data analysis [Lam 2012; Carpendale 2007]). Researchers and practitioners in the field have long identified many of the challenges faced when planning, conducting, and executing an evaluation ofmore » a visualization tool or system [Plaisant 2004]. Evaluation is needed to verify that algorithms and software systems work correctly and that they represent improvements over the current infrastructure. Additionally to effectively transfer new software into a working environment, it is necessary to ensure that the software has utility for the end-users and that the software can be incorporated into the end-user’s infrastructure and work practices. Evaluation test beds require datasets, tasks, metrics and evaluation methodologies. As noted in [Thomas 2005] it is difficult and expensive for any one researcher to setup an evaluation test bed so in many cases evaluation is setup for communities of researchers or for various research projects or programs. Examples of successful community evaluations can be found [Chinchor 1993; Voorhees 2007; FRGC 2012]. As visual analytics environments are intended to facilitate the work of human analysts, one aspect of evaluation needs to focus on the utility of the software to the end-user. This requires representative users, representative tasks, and metrics that measure the utility to the end-user. This is even more difficult as now one aspect of the test methodology is access to representative end-users to participate in the evaluation. In many cases the sensitive nature of data and tasks and difficult access to busy analysts puts even more of a burden on researchers to complete this type of evaluation. User-centered design goes beyond evaluation and starts with the user [Beyer 1997, Shneiderman 2009]. Having some knowledge of the type of data, tasks, and work practices helps researchers and developers know the correct paths to pursue in their work. When access to the end-users is problematic at best and impossible at worst, user-centered design becomes difficult. Researchers are unlikely to go to work on the type of problems faced by inaccessible users. Commercial vendors have difficulties evaluating and improving their products when they cannot observe real users working with their products. In well-established fields such as web site design or office software design, user-interface guidelines have been developed based on the results of empirical studies or the experience of experts. Guidelines can speed up the design process and replace some of the need for observation of actual users [heuristics review references]. In 2006 when the visual analytics community was initially getting organized, no such guidelines existed. Therefore, we were faced with the problem of developing an evaluation framework for the field of visual analytics that would provide representative situations and datasets, representative tasks and utility metrics, and finally a test methodology which would include a surrogate for representative users, increase interest in conducting research in the field, and provide sufficient feedback to the researchers so that they could improve their systems.« less
  • Multimedia analysis has focused on images, video, and to some extent audio and has made progress in single channels excluding text. Visual analytics has focused on the user interaction with data during the analytic process plus the fundamental mathematics and has continued to treat text as did its precursor, information visualization. The general problem we address in this tutorial is the combining of multimedia analysis and visual analytics to deal with multimedia information gathered from different sources, with different goals or objectives, and containing all media types and combinations in common usage.
  • The term Visual Analytics has been around for almost five years by now, but still there are on-going discussions about what it actually is and in particular what is new about it. The core of our view on Visual Analytics is the new enabling and accessible analytic reasoning interactions supported by the combination of automated and visual analytics. In this paper, we outline the scope of Visual Analytics using two problem and three methodological classes in order to work out the need for and purpose of Visual Analytics. Thereby, the respective methods are explained plus examples of analytic reasoning interactionmore » leading to a glimpse into the future of how Visual Analytics methods will enable us to go beyond what is possible when separately using the two methods.« less
  • Graph Analytics -- Selected Lessons Learned and Challenges Ahead