skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters

Abstract

Big data is prevalent in HPC computing. Many HPC projects rely on complex workflows to analyze terabytes or petabytes of data. These workflows often require running over thousands of CPU cores and performing simultaneous data accesses, data movements, and computation. It is challenging to analyze the performance involving terabytes or petabytes of workflow data or measurement data of the executions, from complex workflows over a large number of nodes and multiple parallel task executions. To help identify performance bottlenecks or debug the performance issues in large-scale scientific applications and scientific clusters, we have developed a performance analysis framework, using state-ofthe- art open-source big data processing tools. Our tool can ingest system logs and application performance measurements to extract key performance features, and apply the most sophisticated statistical tools and data mining methods on the performance data. It utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of the big data analysis framework, we conduct case studies on the workflows from an astronomy project known as the Palomar Transient Factory (PTF) and the job logs from the genome analysis scientific cluster. Our study processedmore » many terabytes of system logs and application performance measurements collected on the HPC systems at NERSC. The implementation of our tool is generic enough to be used for analyzing the performance of other HPC systems and Big Data workows.« less

Authors:
 [1];  [2];  [3];  [1];  [4];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. Univ. of California, Berkeley, CA (United States)
  3. California Inst. of Technology (CalTech), Pasadena, CA (United States)
  4. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1393595
DOE Contract Number:
AC02-05CH11231
Resource Type:
Book
Resource Relation:
Related Information: Book Title: Conquering Big Data Using High Performance Computing, Arora, R. (ed.)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Yoo, Wucherl, Koo, Michelle, Cao, Yu, Sim, Alex, Nugent, Peter, and Wu, Kesheng. Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters. United States: N. p., 2016. Web. doi:10.1007/978-3-319-33742-5_7.
Yoo, Wucherl, Koo, Michelle, Cao, Yu, Sim, Alex, Nugent, Peter, & Wu, Kesheng. Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters. United States. doi:10.1007/978-3-319-33742-5_7.
Yoo, Wucherl, Koo, Michelle, Cao, Yu, Sim, Alex, Nugent, Peter, and Wu, Kesheng. Sat . "Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters". United States. doi:10.1007/978-3-319-33742-5_7. https://www.osti.gov/servlets/purl/1393595.
@article{osti_1393595,
title = {Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters},
author = {Yoo, Wucherl and Koo, Michelle and Cao, Yu and Sim, Alex and Nugent, Peter and Wu, Kesheng},
abstractNote = {Big data is prevalent in HPC computing. Many HPC projects rely on complex workflows to analyze terabytes or petabytes of data. These workflows often require running over thousands of CPU cores and performing simultaneous data accesses, data movements, and computation. It is challenging to analyze the performance involving terabytes or petabytes of workflow data or measurement data of the executions, from complex workflows over a large number of nodes and multiple parallel task executions. To help identify performance bottlenecks or debug the performance issues in large-scale scientific applications and scientific clusters, we have developed a performance analysis framework, using state-ofthe- art open-source big data processing tools. Our tool can ingest system logs and application performance measurements to extract key performance features, and apply the most sophisticated statistical tools and data mining methods on the performance data. It utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of the big data analysis framework, we conduct case studies on the workflows from an astronomy project known as the Palomar Transient Factory (PTF) and the job logs from the genome analysis scientific cluster. Our study processed many terabytes of system logs and application performance measurements collected on the HPC systems at NERSC. The implementation of our tool is generic enough to be used for analyzing the performance of other HPC systems and Big Data workows.},
doi = {10.1007/978-3-319-33742-5_7},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Sat Sep 17 00:00:00 EDT 2016},
month = {Sat Sep 17 00:00:00 EDT 2016}
}

Book:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this book.

Save / Share: