DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: PATHA: Performance Analysis Tool for HPC Applications

Abstract

Large science projects rely on complex workflows to analyze terabytes or petabytes of data. These jobs are often running over thousands of CPU cores and simultaneously performing data accesses, data movements, and computation. It is difficult to identify bottlenecks or to debug the performance issues in these large workflows. In order to address these challenges, we have developed Performance Analysis Tool for HPC Applications (PATHA) using the state-of-art open source big data processing tools. Our framework can ingest system logs to extract key performance measures, and apply the most sophisticated statistical tools and data mining methods on the performance data. Furthermore, it utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of PATHA, we conduct a case study on the workflows from an astronomy project known as the Palomar Transient Factory (PTF). This study processed 1.6 TB of system logs collected on the NERSC supercomputer Edison. Using PATHA, we were able to identify performance bottlenecks, which reside in three tasks of PTF workflow with the dependency on the density of celestial objects.

Authors:
 [1];  [2];  [3];  [1];  [4];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. Univ. of California, Berkeley, CA (United States)
  3. California Inst. of Technology (CalTech), Pasadena, CA (United States)
  4. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1379097
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
IEEE International Performance, Computing, and Communications Conference
Additional Journal Information:
Journal Volume: 2016; Conference: 34 IEEE International Performance Computing and Communications Conference (IPCCC 2015), Nanjing (China), 14-16 Dec 2015; Journal ID: ISSN 1097-2641
Publisher:
IEEE
Country of Publication:
United States
Language:
English
Subject:
96 KNOWLEDGE MANAGEMENT AND PRESERVATION; performance analysis; performance evaluation; high performance computing

Citation Formats

Yoo, Wucherl, Koo, Michelle, Cao, Yi, Sim, Alex, Nugent, Peter, and Wu, Kesheng. PATHA: Performance Analysis Tool for HPC Applications. United States: N. p., 2016. Web. doi:10.1109/PCCC.2015.7410313.
Yoo, Wucherl, Koo, Michelle, Cao, Yi, Sim, Alex, Nugent, Peter, & Wu, Kesheng. PATHA: Performance Analysis Tool for HPC Applications. United States. https://doi.org/10.1109/PCCC.2015.7410313
Yoo, Wucherl, Koo, Michelle, Cao, Yi, Sim, Alex, Nugent, Peter, and Wu, Kesheng. Thu . "PATHA: Performance Analysis Tool for HPC Applications". United States. https://doi.org/10.1109/PCCC.2015.7410313. https://www.osti.gov/servlets/purl/1379097.
@article{osti_1379097,
title = {PATHA: Performance Analysis Tool for HPC Applications},
author = {Yoo, Wucherl and Koo, Michelle and Cao, Yi and Sim, Alex and Nugent, Peter and Wu, Kesheng},
abstractNote = {Large science projects rely on complex workflows to analyze terabytes or petabytes of data. These jobs are often running over thousands of CPU cores and simultaneously performing data accesses, data movements, and computation. It is difficult to identify bottlenecks or to debug the performance issues in these large workflows. In order to address these challenges, we have developed Performance Analysis Tool for HPC Applications (PATHA) using the state-of-art open source big data processing tools. Our framework can ingest system logs to extract key performance measures, and apply the most sophisticated statistical tools and data mining methods on the performance data. Furthermore, it utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of PATHA, we conduct a case study on the workflows from an astronomy project known as the Palomar Transient Factory (PTF). This study processed 1.6 TB of system logs collected on the NERSC supercomputer Edison. Using PATHA, we were able to identify performance bottlenecks, which reside in three tasks of PTF workflow with the dependency on the density of celestial objects.},
doi = {10.1109/PCCC.2015.7410313},
journal = {IEEE International Performance, Computing, and Communications Conference},
number = ,
volume = 2016,
place = {United States},
year = {Thu Feb 18 00:00:00 EST 2016},
month = {Thu Feb 18 00:00:00 EST 2016}
}