| | |
Summary: Chapter 1
Introduction
Faulttolerant computing has traditionally been studied in the context of specific
technologies, architectures, and applications. One consequence of this tradition is
that several subdisciplines of faulttolerant computing have emerged that are appar
ently unrelated to each other: these subdisciplines deal with specific classes of faults,
employ distinct models and design methods, and have their own terminology and
classification [14, 40, 58]. As a result, the discipline itself appears to be fragmented.
Another consequence of this tradition is that verification of faulttolerant
systems is often based on implementationspecific artifacts---such as stable storage,
timeouts, and shadow registers---without explicitly specifying what properties of
these artifacts are necessary. Such verification is imprecise and hence unsuitable,
especially for safetycritical systems.
Efforts have been made in the last decade to redress the problems described
above. Most of these efforts have focussed on uniformly classifying faulttolerant sys
tems, and two noteworthy classifications have emerged. One is based on a distinction
between the notions of faults, errors, and failures: faults in a physical domain can
cause errors in an information domain, whereas errors in an information domain can
1
|