skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Adaptive message logging for incremental replay of message-passing programs

Conference ·
OSTI ID:46277
;  [1]
  1. Brown Univ., Providence, RI (United States). Dept. of Computer Science

Cyclic debugging executes a program over and over to track down bugs. However, for message-passing parallel programs, nondeterminacy makes cyclic debugging impossible without support of special tools. To provide repeatable executions, messages must be traced for later replay. Since parallel programs are long-running, providing fast response to debugging queries requires incremental replay, where reexecution is started from intermediate states instead of from the beginning. To support incremental replay, processes must be checkpointed periodically and the space cost of saving these messages can be prohibitive. This paper presents an adaptive message logging algorithm that keeps these costs low by logging only a fraction of the messages. The algorithm dynamically tracks dependences among messages to determine which cause domino effects and must be traced. The domino effect can force a replay to start arbitrarily far back in the execution, and domino-free replay allows any part of the execution to be quickly reexecuted. Experiments on an iPSC/860 hypercube indicate that their algorithm logs only 1--10% of the messages, a 1 to 2 order of magnitude reduction over past schemes which log every message. Their experiments also show that the resulting logs provide a small bound on the amount of reexecution needed to satisfy any replay request. Their new logging algorithm thus reduces the overhead of message logging while bounding the response time to replay requests.

OSTI ID:
46277
Report Number(s):
CONF-931115-; TRN: IM9522%%279
Resource Relation:
Conference: Supercomputing conference on high performance computing and communications, Portland, OR (United States), 15-19 Nov 1993; Other Information: PBD: 1993; Related Information: Is Part Of Supercomputing `93: Proceedings; PB: 961 p.
Country of Publication:
United States
Language:
English