skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

This content will become publicly available on June 28, 2020

Title: Exploratory Visual Analysis of Anomalous Runtime Behavior in Streaming High Performance Computing Applications

Abstract

Online analysis of runtime behavior is essential for performance tuning in streaming scientific workflows. Integration of anomaly detection and visualization is necessary to support human-centered analysis, such as verification of candidate anomalies utilizing domain knowledge. In this work, we propose an efficient and scalable visual analytics system for online performance analysis of scientific workflows toward the exascale scenario. Here, our approach uses a call stack tree representation to encode the structural and temporal information of the function executions. Based on the call stack tree features (e.g., execution time of the root function or vector representation of the tree structure), we employ online anomaly detection approaches to identify candidate anomalous function executions. We also present a set of visualization tools for verification and exploration in a level-of-detailed manner. General information, such as distribution of execution times, are provided in an overview visualization. The detailed structure (e.g., function invocation relations) and the temporal information (e.g., message communication) of the execution call stack of interest are also visualized. The usability and efficiency of our methods are verified in a real-world HPC application.

Authors:
 [1];  [1];  [2];  [2];  [3];  [2];  [2]
  1. Stony Brook Univ., NY (United States)
  2. Brookhaven National Lab. (BNL), Upton, NY (United States)
  3. Stony Brook Univ., NY (United States); Brookhaven National Lab. (BNL), Upton, NY (United States)
Publication Date:
Research Org.:
Brookhaven National Lab. (BNL), Upton, NY (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (SC-21)
OSTI Identifier:
1560000
Report Number(s):
BNL-212038-2019-JAAM
Grant/Contract Number:  
SC0012704
Resource Type:
Accepted Manuscript
Resource Relation:
Conference: International Conference on Computational Science (ICCS 2019), Faro (Portugal), 12-14 Jun 2019; Related Information: Lecture Notes in Computer Science book series (LNCS, volume 11536)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Anomaly Detection; High Performance Computing; Streaming Analysis; Trace Events; Visual Analytics

Citation Formats

Xie, Cong, Jeong, Wonyong, Matyasfalvi, Gyorgy, Van Dam, Hubertus, Mueller, Klaus, Yoo, Shinjae, and Xu, Wei. Exploratory Visual Analysis of Anomalous Runtime Behavior in Streaming High Performance Computing Applications. United States: N. p., 2019. Web. doi:10.1007/978-3-030-22734-0_12.
Xie, Cong, Jeong, Wonyong, Matyasfalvi, Gyorgy, Van Dam, Hubertus, Mueller, Klaus, Yoo, Shinjae, & Xu, Wei. Exploratory Visual Analysis of Anomalous Runtime Behavior in Streaming High Performance Computing Applications. United States. doi:10.1007/978-3-030-22734-0_12.
Xie, Cong, Jeong, Wonyong, Matyasfalvi, Gyorgy, Van Dam, Hubertus, Mueller, Klaus, Yoo, Shinjae, and Xu, Wei. Fri . "Exploratory Visual Analysis of Anomalous Runtime Behavior in Streaming High Performance Computing Applications". United States. doi:10.1007/978-3-030-22734-0_12.
@article{osti_1560000,
title = {Exploratory Visual Analysis of Anomalous Runtime Behavior in Streaming High Performance Computing Applications},
author = {Xie, Cong and Jeong, Wonyong and Matyasfalvi, Gyorgy and Van Dam, Hubertus and Mueller, Klaus and Yoo, Shinjae and Xu, Wei},
abstractNote = {Online analysis of runtime behavior is essential for performance tuning in streaming scientific workflows. Integration of anomaly detection and visualization is necessary to support human-centered analysis, such as verification of candidate anomalies utilizing domain knowledge. In this work, we propose an efficient and scalable visual analytics system for online performance analysis of scientific workflows toward the exascale scenario. Here, our approach uses a call stack tree representation to encode the structural and temporal information of the function executions. Based on the call stack tree features (e.g., execution time of the root function or vector representation of the tree structure), we employ online anomaly detection approaches to identify candidate anomalous function executions. We also present a set of visualization tools for verification and exploration in a level-of-detailed manner. General information, such as distribution of execution times, are provided in an overview visualization. The detailed structure (e.g., function invocation relations) and the temporal information (e.g., message communication) of the execution call stack of interest are also visualized. The usability and efficiency of our methods are verified in a real-world HPC application.},
doi = {10.1007/978-3-030-22734-0_12},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {6}
}

Journal Article:
Free Publicly Available Full Text
This content will become publicly available on June 28, 2020
Publisher's Version of Record

Save / Share:

Works referenced in this record:

SYNCTRACE: Visual thread-interplay analysis
conference, September 2013

  • Karran, Benjamin; Trumper, Jonas; Dollner, Jurgen
  • 2013 First IEEE Working Conference on Software Visualization (VISSOFT)
  • DOI: 10.1109/VISSOFT.2013.6650534

NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations
journal, September 2010

  • Valiev, M.; Bylaska, E. J.; Govind, N.
  • Computer Physics Communications, Vol. 181, Issue 9, p. 1477-1489
  • DOI: 10.1016/j.cpc.2010.04.018