Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

DAFT: Decoupled Acyclic Fault Tolerance , Jae W. Lee1

Summary: DAFT: Decoupled Acyclic Fault Tolerance
Yun Zhang1
, Jae W. Lee1
, Nick P. Johnson1
, and David I. August1
1 Department of Computer Science, Princeton University, 35 Olden St., Princeton, NJ
08540. Email: {yunzhang, jl7, npjohnso, august}@princeton.edu
Higher transistor counts, lower voltage levels, and reduced noise margin increase
the susceptibility of multicore processors to transient faults. Redundant hardware
modules can detect such faults, but software techniques are more appealing for their
low cost and flexibility. Recent software proposals have not achieved widespread
acceptance because they either increase register pressure, double memory usage, or
are too slow in the absence of hardware extensions. This paper presents DAFT, a
fast, safe, and memory efficient transient fault detection framework for commodity
multicore systems. DAFT replicates computation across multiple cores and sched-
ules fault detection off the critical path. Where possible, values are speculated to
be correct and only communicated to the redundant thread at essential program
points. DAFT is implemented in the LLVM compiler framework and evaluated us-
ing SPEC CPU2000 and SPEC CPU2006 benchmarks on a commodity multicore
system. Evaluation results demonstrate that speculation allows DAFT to reduce


Source: August, David - Department of Computer Science, Princeton University


Collections: Computer Technologies and Information Sciences