| | |
Summary: An Analysis of Communication Induced Checkpointing
Lorenzo Alvisi \Lambda z Elmootazbellah Elnozahy y Sriram Rao \Lambda z Syed Amir Husain z Asanka De Mel z
z Department of Computer Sciences y IBM Austin Research Lab
U. T. Austin Austin, Texas, USA.
Abstract
Communication induced checkpointing (CIC) allows processes in
a distributed computation to take independent checkpoints and
to avoid the domino effect. This paper presents an analysis of
CIC protocols based on a prototype implementation and validated
simulations. Our result inidcate that there is sufficient evidence to
suspect that much of the conventional wisdom about these protocols
is questionable.
1 Introduction
There are three styles for implementing applicationtransparent
rollbackrecovery in messagepassing systems, namely coordi
nated checkpointing, message logging, and communicationinduced
checkpointing (CIC) [5]. Both coordinated checkpointing and
message logging have received considerable analysis in the litera
ture [6, 12, 14, 15], but little is known about the behavior of CIC
protocols. This paper presents an experimental analysis of these
|