Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Supporting Fault-Tolerance in Streaming Grid Applications

Summary: Supporting Fault-Tolerance in Streaming Grid
Qian Zhu Liang Chen Gagan Agrawal
Department of Computer Science and Engineering
Ohio State University
Columbus, OH, 43210
Abstract-- This paper considers the problem of supporting and
efficiently implementing fault-tolerance for tightly-coupled and
pipelined applications, especially streaming applications, in a grid
environment. We provide an alternative to basic checkpointing
and use the notion of Light-weight Summary Structure(LSS) to
enable efficient failure-recovery. The idea behind LSS is that at
certain points during the execution of a processing stage, the
state of the program can be summarized by a small amount
of memory. This allows us to store copies of LSS for enabling
failure-recovery, which causes low overhead fault-tolerance. Our
work can be viewed as an optimization and adaptation of the
idea of application-level checkpointing to a different execution
environment, and for a different class of applications.


Source: Agrawal, Gagan - Department of Computer Science and Engineering, Ohio State University


Collections: Computer Technologies and Information Sciences