Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Fault tolerance and reliability analysis of large-scale multicomputer systems

Thesis/Dissertation ·
OSTI ID:6045974

Fault tolerance is to become an integral part in the architectural design of large-scale systems and reliability and important measure in evaluation of their performance. The issue of the effects of increased processor failures rate in large-scale gracefully degradable distributed computing systems is addressed here. A probabilistic model of network disconnection is developed and used to evaluate the effects of node failures on the network topology. The results show that although the probability of network disconnection decreases with increasing system size, the resilience of a given topology to network disconnection decreases when the connectivity is kept constant. Combined measures of performance and reliability are used to evaluate the trade-off between increased computational power and failure rates as the number of processors is increased. For a given recovery mechanism, an optimal number of processors exist at which the amount of reliable computational work the system could deliver is maximum. Finally, a simple distributed iterative algorithm for fault tolerance is presented and evaluated. Based on a functional execution model of tasks, this algorithm allows the implementation of run-time fault detection, check-pointing, and recovery.

Research Organization:
University of Southern California, Los Angeles, CA (USA)
OSTI ID:
6045974
Country of Publication:
United States
Language:
English

Similar Records

Network resilience; A measure of network fault tolerance
Journal Article · Wed Jan 31 23:00:00 EST 1990 · IEEE Transactions on Computers (Institute of Electrical and Electronics Engineers); (USA) · OSTI ID:6987690

Fault tolerance for VLSI multicomputers
Thesis/Dissertation · Mon Dec 31 23:00:00 EST 1984 · OSTI ID:5127488

Design of fault-tolerant protocols for distributed processing systems
Thesis/Dissertation · Thu Dec 31 23:00:00 EST 1987 · OSTI ID:6988027