Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Adaptive message logging for incremental replay of message-passing programs

Conference ·
OSTI ID:46277
;  [1]
  1. Brown Univ., Providence, RI (United States). Dept. of Computer Science

Cyclic debugging executes a program over and over to track down bugs. However, for message-passing parallel programs, nondeterminacy makes cyclic debugging impossible without support of special tools. To provide repeatable executions, messages must be traced for later replay. Since parallel programs are long-running, providing fast response to debugging queries requires incremental replay, where reexecution is started from intermediate states instead of from the beginning. To support incremental replay, processes must be checkpointed periodically and the space cost of saving these messages can be prohibitive. This paper presents an adaptive message logging algorithm that keeps these costs low by logging only a fraction of the messages. The algorithm dynamically tracks dependences among messages to determine which cause domino effects and must be traced. The domino effect can force a replay to start arbitrarily far back in the execution, and domino-free replay allows any part of the execution to be quickly reexecuted. Experiments on an iPSC/860 hypercube indicate that their algorithm logs only 1--10% of the messages, a 1 to 2 order of magnitude reduction over past schemes which log every message. Their experiments also show that the resulting logs provide a small bound on the amount of reexecution needed to satisfy any replay request. Their new logging algorithm thus reduces the overhead of message logging while bounding the response time to replay requests.

OSTI ID:
46277
Report Number(s):
CONF-931115--
Country of Publication:
United States
Language:
English

Similar Records

Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance
Conference · Mon Sep 22 00:00:00 EDT 2014 · OSTI ID:1178512

Debugging parallel programs with instant replay
Journal Article · Tue Mar 31 23:00:00 EST 1987 · IEEE Trans. Comput.; (United States) · OSTI ID:6896088

Hardware-assisted replay of microprocessor programs
Book · Mon Dec 31 23:00:00 EST 1990 · OSTI ID:7205533