skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: MUST: A Scalable Approach to Runtime Error Detection in MPI Programs

Abstract

The Message-Passing Interface (MPI) is large and complex. Therefore, programming MPI is error prone. Several MPI runtime correctness tools address classes of usage errors, such as deadlocks or nonportable constructs. To our knowledge none of these tools scales to more than about 100 processes. However, some of the current HPC systems use more than 100,000 cores and future systems are expected to use far more. Since errors often depend on the task count used, we need correctness tools that scale to the full system size. We present a novel framework for scalable MPI correctness tools to address this need. Our fine-grained, module-based approach supports rapid prototyping and allows correctness tools built upon it to adapt to different architectures and use cases. The design uses PnMPI to instantiate a tool from a set of individual modules. We present an overview of our design, along with first performance results for a proof of concept implementation.

Authors:
; ; ;
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
974859
Report Number(s):
LLNL-CONF-426350
TRN: US201007%%858
DOE Contract Number:  
W-7405-ENG-48
Resource Type:
Conference
Resource Relation:
Conference: Presented at: 3rd Parallel Tools Workshop, Dresden, Germany, Sep 14 - Sep 15, 2009
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS; DESIGN; DETECTION; IMPLEMENTATION; PERFORMANCE; PROGRAMMING

Citation Formats

Hilbrich, T, Schulz, M, de Supinski, B R, and Muller, M. MUST: A Scalable Approach to Runtime Error Detection in MPI Programs. United States: N. p., 2010. Web.
Hilbrich, T, Schulz, M, de Supinski, B R, & Muller, M. MUST: A Scalable Approach to Runtime Error Detection in MPI Programs. United States.
Hilbrich, T, Schulz, M, de Supinski, B R, and Muller, M. Wed . "MUST: A Scalable Approach to Runtime Error Detection in MPI Programs". United States. https://www.osti.gov/servlets/purl/974859.
@article{osti_974859,
title = {MUST: A Scalable Approach to Runtime Error Detection in MPI Programs},
author = {Hilbrich, T and Schulz, M and de Supinski, B R and Muller, M},
abstractNote = {The Message-Passing Interface (MPI) is large and complex. Therefore, programming MPI is error prone. Several MPI runtime correctness tools address classes of usage errors, such as deadlocks or nonportable constructs. To our knowledge none of these tools scales to more than about 100 processes. However, some of the current HPC systems use more than 100,000 cores and future systems are expected to use far more. Since errors often depend on the task count used, we need correctness tools that scale to the full system size. We present a novel framework for scalable MPI correctness tools to address this need. Our fine-grained, module-based approach supports rapid prototyping and allows correctness tools built upon it to adapt to different architectures and use cases. The design uses PnMPI to instantiate a tool from a set of individual modules. We present an overview of our design, along with first performance results for a proof of concept implementation.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2010},
month = {3}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: