| | |
Summary: Deriving Optimal Checkpoint Protocols for
Distributed Shared Memory Architectures
Lorenzo Alvisi 1? and Keith Marzullo 2??
1 Cornell University, Department of Computer Science, Ithaca NY
2 University of California at San Diego, Department of Computer Science and
Engineering, La Jolla CA
Abstract. Uncoordinated checkpointing is one technique used to build
processes that can recover to a consistent state after crashing. This tech
nique requires each process to periodically record its state in a check
point. Furthermore, the threads executing on each process log any non
deterministic action that they take following the latest checkpointed
state. When a process crashes, a new process, initialized with the ap
propriate recorded local state, is created in its place. The new process
restarts executing, and whenever one of its threads confronts a non
deterministic choice, the thread references the log in order to reproduce
the same action performed before the crash. Thus, uncoordinated check
pointing implements an abstraction of a resilient process in which the
crash of a process is translated into intermittent unavailability of that
process.
We give a specification of the consistency property ``no orphan threads''
|