skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Exploratory Visual Analysis of Anomalous Runtime Behavior in Streaming High Performance Computing Applications

Journal Article · · Lecture Notes in Computer Science
 [1];  [1];  [2];  [2];  [3];  [2];  [2]
  1. Stony Brook Univ., NY (United States)
  2. Brookhaven National Lab. (BNL), Upton, NY (United States)
  3. Stony Brook Univ., NY (United States); Brookhaven National Lab. (BNL), Upton, NY (United States)

Online analysis of runtime behavior is essential for performance tuning in streaming scientific workflows. Integration of anomaly detection and visualization is necessary to support human-centered analysis, such as verification of candidate anomalies utilizing domain knowledge. In this work, we propose an efficient and scalable visual analytics system for online performance analysis of scientific workflows toward the exascale scenario. Here, our approach uses a call stack tree representation to encode the structural and temporal information of the function executions. Based on the call stack tree features (e.g., execution time of the root function or vector representation of the tree structure), we employ online anomaly detection approaches to identify candidate anomalous function executions. We also present a set of visualization tools for verification and exploration in a level-of-detailed manner. General information, such as distribution of execution times, are provided in an overview visualization. The detailed structure (e.g., function invocation relations) and the temporal information (e.g., message communication) of the execution call stack of interest are also visualized. The usability and efficiency of our methods are verified in a real-world HPC application.

Research Organization:
Brookhaven National Laboratory (BNL), Upton, NY (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research
Grant/Contract Number:
SC0012704
OSTI ID:
1560000
Report Number(s):
BNL-212038-2019-JAAM
Journal Information:
Lecture Notes in Computer Science, Vol. LNCS 11536; Conference: International Conference on Computational Science (ICCS 2019), Faro (Portugal), 12-14 Jun 2019; Related Information: Lecture Notes in Computer Science book series (LNCS, volume 11536); ISSN 0302-9743
Publisher:
SpringerCopyright Statement
Country of Publication:
United States
Language:
English

References (14)

Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir book January 2012
A Visual Network Analysis Method for Large-Scale Parallel I/O Systems
  • Sigovan, Carmen; Muelder, Chris; Ma, Kwan-Liu
  • 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2013.96
conference May 2013
Querying Large Scientific Data Sets with Adaptable IO System ADIOS book January 2018
A Visual Analytics Framework for the Detection of Anomalous Call Stack Trees in High Performance Computing Applications journal January 2019
Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis journal March 1964
The Tau Parallel Performance System journal May 2006
Toward Scalable Performance Visualization with Jumpshot journal August 1999
Stack Trace Analysis for Large Scale Debugging conference March 2007
NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations journal September 2010
A Scalable Observation System for Introspection and In Situ Analytics conference November 2016
Performance Visualization for TAU Instrumented Scientific Workflows [Performance Visualization for TAU Instrumented Scientific Workflows]
  • Xie, Cong; Xu, Wei; Ha, Sungsoo
  • International Conference on Information Visualization Theory and Applications, Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications https://doi.org/10.5220/0006646803330340
conference January 2018
LOF: identifying density-based local outliers journal June 2000
SYNCTRACE: Visual thread-interplay analysis conference September 2013
Multi-scale navigation of large trace data: A survey: Multi-scale Navigation of Large Trace Data: A Survey journal March 2017

Figures / Tables (7)