skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: TAUOVERSUPERMON: LOW-OVERHEAD ONLINE PARALLEL PERFORMANCE MONITORING

Abstract

Online or Real-time application performance monitoring allows tracking performance characteristics during execution as opposed to doing so post-mortem. This opens up several possibilities otherwise unavailable such as real-time visualization and application performance steering that can be useful in the context of long-running applications. Two fundamental components that constitute such a performance monitor are the measurement and transport systems. The former captures performance metrics of individual contexts (processes, threads). The latter enables querying the parallel/distributed state from the different contexts and also allows measurement control. As HPC systems grow in size and complexity, the key challenge is to keep the online performance monitor scalable and low overhead while still providing a useful performance reporting capability. We adapt and combine two existing, mature systems - Tuning and Analysis Utility (TAU) and Supermon - to address this problem. Tau performs the measurement while Supermon is used to collect the distributed measurement state. Our experiments show that this novel approach of using a cluster-monitor, Supermon, as the transport for online performance data from Tau leads to very low-overhead application monitoring as well as other beneits unavailable from using a traditional transport such as NFS.

Authors:
 [1];  [1];  [1];  [1];  [1]
  1. Los Alamos National Laboratory
Publication Date:
Research Org.:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
OSTI Identifier:
985893
Report Number(s):
LA-UR-07-0662
TRN: US201017%%71
DOE Contract Number:
AC52-06NA25396
Resource Type:
Conference
Resource Relation:
Conference: EUROPAR 2007 ; 200708 ; RENNES
Country of Publication:
United States
Language:
English
Subject:
99; METRICS; MONITORING; MONITORS; PERFORMANCE; TRANSPORT; TUNING

Citation Formats

SOTTILE, MATTHEW JOSEPH, NATARAJ, AROON, MALONY, ALLEN, MORRIS, ALAN, and SHENDE, SAMEER. TAUOVERSUPERMON: LOW-OVERHEAD ONLINE PARALLEL PERFORMANCE MONITORING. United States: N. p., 2007. Web.
SOTTILE, MATTHEW JOSEPH, NATARAJ, AROON, MALONY, ALLEN, MORRIS, ALAN, & SHENDE, SAMEER. TAUOVERSUPERMON: LOW-OVERHEAD ONLINE PARALLEL PERFORMANCE MONITORING. United States.
SOTTILE, MATTHEW JOSEPH, NATARAJ, AROON, MALONY, ALLEN, MORRIS, ALAN, and SHENDE, SAMEER. Tue . "TAUOVERSUPERMON: LOW-OVERHEAD ONLINE PARALLEL PERFORMANCE MONITORING". United States. doi:. https://www.osti.gov/servlets/purl/985893.
@article{osti_985893,
title = {TAUOVERSUPERMON: LOW-OVERHEAD ONLINE PARALLEL PERFORMANCE MONITORING},
author = {SOTTILE, MATTHEW JOSEPH and NATARAJ, AROON and MALONY, ALLEN and MORRIS, ALAN and SHENDE, SAMEER},
abstractNote = {Online or Real-time application performance monitoring allows tracking performance characteristics during execution as opposed to doing so post-mortem. This opens up several possibilities otherwise unavailable such as real-time visualization and application performance steering that can be useful in the context of long-running applications. Two fundamental components that constitute such a performance monitor are the measurement and transport systems. The former captures performance metrics of individual contexts (processes, threads). The latter enables querying the parallel/distributed state from the different contexts and also allows measurement control. As HPC systems grow in size and complexity, the key challenge is to keep the online performance monitor scalable and low overhead while still providing a useful performance reporting capability. We adapt and combine two existing, mature systems - Tuning and Analysis Utility (TAU) and Supermon - to address this problem. Tau performs the measurement while Supermon is used to collect the distributed measurement state. Our experiments show that this novel approach of using a cluster-monitor, Supermon, as the transport for online performance data from Tau leads to very low-overhead application monitoring as well as other beneits unavailable from using a traditional transport such as NFS.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Jan 30 00:00:00 EST 2007},
month = {Tue Jan 30 00:00:00 EST 2007}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: