Summary: Egida: An Extensible Toolkit For Lowoverhead FaultTolerance
Sriram Rao \Lambda Lorenzo Alvisi \Lambda Harrick M. Vin y
Department of Computer Sciences
The University of Texas at Austin
Taylor Hall 2.124, Austin, Texas 787121188, USA.
We discuss the design and implementation of Egida, an object
oriented toolkit designed to support transparent rollbackrecovery.
Egida exports a simple specification language that can be used
to express arbitrary rollback recovery protocols. From this spec
ification, Egida automatically synthesizes an implementation of
the specified protocol by gluing together the appropriate objects
from an available library of ``building blocks''. Egida is extensible
and facilitates rapid implementation of rollback recovery protocols
with minimal programming effort. We have integrated Egida with
the MPICH implementation of the MPI standard. Existing MPI
applications can take advantage of Egida without any modifica
tions: faulttolerance is achieved transparently---all that is needed
is a simple relink of the MPI application with Egida.