| | |
Summary: Lowcost Faulttolerance in Barrier Synchronizations
Sandeep S. Kulkarni Anish Arora
Department of Computer and Information Science 1
The Ohio State University
Columbus, OH 43210 USA
Abstract
In this paper, we show how faulttolerance can be effectively added to several
types of faults in program computations that use barrier synchronization. We
divide the faults that occur in practice into two classes, detectable and undetectable,
and design a fully distributed program that tolerates the faults in both classes. Our
program guarantees that every barrier is executed correctly even if detectable faults
occur, and that eventually every barrier is executed correctly even if undetectable
faults occur. Via analytical as well as simulation results we show that the cost
of adding faulttolerance is low, in part by comparing the times required by our
program with that required by the corresponding faultintolerant counterpart.
Keywords: faulttolerance, multitolerance, detectable and undetectable faults,
synchronization, concurrency.
1 Email: fkulkarni,anishg@cis.ohiostate.edu; Web: http://www.cis.ohiostate.edu/f~ kulkarni,~anish g. Re
search supported in part by NSF Grant CCR9308640, OSU Grant 221506, and NSA MDA9049611011.
|