Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

MUST: A Scalable Approach to Runtime Error Detection in MPI Programs

Conference ·
The Message-Passing Interface (MPI) is large and complex. Therefore, programming MPI is error prone. Several MPI runtime correctness tools address classes of usage errors, such as deadlocks or nonportable constructs. To our knowledge none of these tools scales to more than about 100 processes. However, some of the current HPC systems use more than 100,000 cores and future systems are expected to use far more. Since errors often depend on the task count used, we need correctness tools that scale to the full system size. We present a novel framework for scalable MPI correctness tools to address this need. Our fine-grained, module-based approach supports rapid prototyping and allows correctness tools built upon it to adapt to different architectures and use cases. The design uses PnMPI to instantiate a tool from a set of individual modules. We present an overview of our design, along with first performance results for a proof of concept implementation.
Research Organization:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA
Sponsoring Organization:
USDOE
DOE Contract Number:
W-7405-ENG-48
OSTI ID:
974859
Report Number(s):
LLNL-CONF-426350
Country of Publication:
United States
Language:
English

References (6)

Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications book July 2008
ISP: a tool for model checking MPI programs
  • Vakkalanka, Sarvani S.; Sharma, Subodh; Gopalakrishnan, Ganesh
  • Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming - PPoPP '08 https://doi.org/10.1145/1345206.1345258
conference January 2008
Deadlock detection in MPI programs journal January 2002
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools conference January 2003
Dynamic Software Testing of MPI Applications with Umpire conference January 2000
P N MPI tools : a whole lot greater than the sum of their parts conference January 2007

Similar Records

MPI Runtime Error Detection with MUST: Advances in Deadlock Detection
Journal Article · Mon Dec 31 19:00:00 EST 2012 · Scientific Programming · OSTI ID:1197888

Dynamic Software Testing of MPI Applications with Umpire
Conference · Mon Jul 24 00:00:00 EDT 2000 · OSTI ID:15006499

A Dynamic MPI Software Correctness Checking Tool
Software · Mon Oct 31 00:00:00 EST 2005 · OSTI ID:1230839