skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Ordering Traces Logically to Identify Lateness in Message Passing Programs

Abstract

Event traces are valuable for understanding the behavior of parallel programs. However, automatically analyzing a large parallel trace is difficult, especially without a specific objective. We aid this endeavor by extracting a trace's logical structure, an ordering of trace events derived from happened-before relationships, while taking into account developer intent. Using this structure, we can calculate an operation's delay relative to its peers on other processes. The logical structure also serves as a platform for comparing and clustering processes as well as highlighting communication patterns in a trace visualization. We present an algorithm for determining this idealized logical structure from traces of message passing programs, and we develop metrics to quantify delays and differences among processes. We implement our techniques in Ravel, a parallel trace visualization tool that displays both logical and physical timelines. Rather than showing the duration of each operation, we display where delays begin and end, and how they propagate. As a result, we apply our approach to the traces of several message passing applications, demonstrating the accuracy of our extracted structure and its utility in analyzing these codes.

Authors:
 [1];  [2];  [2];  [2];  [1];  [2]
  1. Univ. of California, Davis, CA (United States)
  2. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1410024
Report Number(s):
LLNL-JRNL-668754
Journal ID: ISSN 1045-9219
Grant/Contract Number:  
AC52-07NA27344
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
IEEE Transactions on Parallel and Distributed Systems
Additional Journal Information:
Journal Volume: 27; Journal Issue: 3; Journal ID: ISSN 1045-9219
Publisher:
IEEE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; trace analysis; performance

Citation Formats

Isaacs, Katherine E., Gamblin, Todd, Bhatele, Abhinav, Schulz, Martin, Hamann, Bernd, and Bremer, Peer -Timo. Ordering Traces Logically to Identify Lateness in Message Passing Programs. United States: N. p., 2015. Web. doi:10.1109/TPDS.2015.2417531.
Isaacs, Katherine E., Gamblin, Todd, Bhatele, Abhinav, Schulz, Martin, Hamann, Bernd, & Bremer, Peer -Timo. Ordering Traces Logically to Identify Lateness in Message Passing Programs. United States. doi:10.1109/TPDS.2015.2417531.
Isaacs, Katherine E., Gamblin, Todd, Bhatele, Abhinav, Schulz, Martin, Hamann, Bernd, and Bremer, Peer -Timo. Mon . "Ordering Traces Logically to Identify Lateness in Message Passing Programs". United States. doi:10.1109/TPDS.2015.2417531. https://www.osti.gov/servlets/purl/1410024.
@article{osti_1410024,
title = {Ordering Traces Logically to Identify Lateness in Message Passing Programs},
author = {Isaacs, Katherine E. and Gamblin, Todd and Bhatele, Abhinav and Schulz, Martin and Hamann, Bernd and Bremer, Peer -Timo},
abstractNote = {Event traces are valuable for understanding the behavior of parallel programs. However, automatically analyzing a large parallel trace is difficult, especially without a specific objective. We aid this endeavor by extracting a trace's logical structure, an ordering of trace events derived from happened-before relationships, while taking into account developer intent. Using this structure, we can calculate an operation's delay relative to its peers on other processes. The logical structure also serves as a platform for comparing and clustering processes as well as highlighting communication patterns in a trace visualization. We present an algorithm for determining this idealized logical structure from traces of message passing programs, and we develop metrics to quantify delays and differences among processes. We implement our techniques in Ravel, a parallel trace visualization tool that displays both logical and physical timelines. Rather than showing the duration of each operation, we display where delays begin and end, and how they propagate. As a result, we apply our approach to the traces of several message passing applications, demonstrating the accuracy of our extracted structure and its utility in analyzing these codes.},
doi = {10.1109/TPDS.2015.2417531},
journal = {IEEE Transactions on Parallel and Distributed Systems},
number = 3,
volume = 27,
place = {United States},
year = {Mon Mar 30 00:00:00 EDT 2015},
month = {Mon Mar 30 00:00:00 EDT 2015}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 1 work
Citation information provided by
Web of Science

Save / Share: