An integrated study of fault tolerance in computing systems

Lin, Tein-Hsiang

Title: An integrated study of fault tolerance in computing systems

Miscellaneous · Fri Jan 01 00:00:00 EST 1988

OSTI ID:5921993

Lin, Tein-Hsiang

A general framework for the design and analysis of distributed fault-tolerant systems is proposed including fault/error occurrence and detection, error propagation, fault location, retry, system reconfiguration, damage assessment, and error recovery. Detection mechanisms are usually assumed to be so perfect that problems within a particular phase of fault tolerance can be studied without considering its interplay with other phases. This dissertation shows that the assumption of imperfect detection mechanisms will greatly influence fault diagnosis, rollback recovery, and checkpointing. Two additional related problems are studied. One is concerned with the use of retry following a fault detection and the other with the optimal placement of checkpoints in a real-time task with or without the perfect detection assumption. A fault-classification scheme is developed for on-line estimation of fault parameters.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Michigan Univ., Ann Arbor, MI (USA)

OSTI ID:: 5921993

Resource Relation:: Other Information: Thesis (Ph. D.)

Country of Publication:: United States

Language:: English

Similar Records

On fault-tolerant mechanisms in distributed systems

Miscellaneous · Fri Jan 01 00:00:00 EST 1988 · OSTI ID:5921993

Israel, S R

Fault-tolerant delivery algorithms

Miscellaneous · Mon Jan 01 00:00:00 EST 1990 · OSTI ID:5921993

Al Jaber, H S

The analysis and optimization of fault tolerance in multiprocessor systems: A graph theoretic approach

Miscellaneous · Sun Jan 01 00:00:00 EST 1989 · OSTI ID:5921993

Yau, H W

Related Subjects

99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
FAULT TOLERANT COMPUTERS
DESIGN
STOCHASTIC PROCESSES
ERRORS
FAULT TREE ANALYSIS
COMPUTERS
DIGITAL COMPUTERS
SYSTEM FAILURE ANALYSIS
SYSTEMS ANALYSIS
990200* - Mathematics & Computers

Title: An integrated study of fault tolerance in computing systems

Citation Formats

Similar Records

Related Subjects