skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: TAUOVERSUPERMON: LOW-OVERHEAD ONLINE PARALLEL PERFORMANCE MONITORING

Conference ·
OSTI ID:985893

Online or Real-time application performance monitoring allows tracking performance characteristics during execution as opposed to doing so post-mortem. This opens up several possibilities otherwise unavailable such as real-time visualization and application performance steering that can be useful in the context of long-running applications. Two fundamental components that constitute such a performance monitor are the measurement and transport systems. The former captures performance metrics of individual contexts (processes, threads). The latter enables querying the parallel/distributed state from the different contexts and also allows measurement control. As HPC systems grow in size and complexity, the key challenge is to keep the online performance monitor scalable and low overhead while still providing a useful performance reporting capability. We adapt and combine two existing, mature systems - Tuning and Analysis Utility (TAU) and Supermon - to address this problem. Tau performs the measurement while Supermon is used to collect the distributed measurement state. Our experiments show that this novel approach of using a cluster-monitor, Supermon, as the transport for online performance data from Tau leads to very low-overhead application monitoring as well as other beneits unavailable from using a traditional transport such as NFS.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
DOE Contract Number:
AC52-06NA25396
OSTI ID:
985893
Report Number(s):
LA-UR-07-0662; TRN: US201017%%71
Resource Relation:
Conference: EUROPAR 2007 ; 200708 ; RENNES
Country of Publication:
United States
Language:
English