Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Debugging parallel programs with instant replay

Journal Article · · IEEE Trans. Comput.; (United States)

The debugging cycle is the most common methodology for finding and correcting errors in sequential programs. Cyclic debugging is effective because sequential programs are usually deterministic. Debugging parallel programs is considerably more difficult because successive executions of the same program often do not produce the same results. In this paper they present a general solution for reproducing the execution behavior of parallel programs, termed Instant Replay. During program execution they save the relative order of significant events as they occur, not the data associated with such events. As a result, our approach requires less time and space to save the information needed for program replay than other methods. Our technique is not dependent on any particular form of interprocess communication. It provides for replay of an entire program, rather than individual processes in isolation. No centralized bottlenecks are introduced and there is no need for synchronized clocks or a globally consistent logical time. We describe a prototype implementation of Instant Replay on the BBN Butterfly Parallel Processor, and discuss how it can be incorporated into the debugging cycle for parallel programs.

Research Organization:
Dept. of Computer Science, Univ. of Rochester, Rochester, NY 14627
OSTI ID:
6896088
Journal Information:
IEEE Trans. Comput.; (United States), Journal Name: IEEE Trans. Comput.; (United States) Journal Issue: 4 Vol. C-36:4; ISSN ITCOB
Country of Publication:
United States
Language:
English