Fault-tolerant and efficient parallel computation. Doctoral thesis
Technical Report
·
OSTI ID:7169968
Recent advances in computer technology made parallel machines a reality. Massively parallel systems use many general-purpose, inexpensive processing elements to attain computation speed-ups comparable to or better than those achieved by expensive, specialized machines with a small number of fast processors. In such setting, however, one would expect to see an increased number of processor failures attributable to hardware or software. This may eliminate the potential advantage of parallel computation. We believe that this presents a reliability bottleneck that is among fundamental problems in parallel computation. We investigate algorithmic ways of introducing fault-tolerance in multiprocessors under the constraint of preserving efficiency. This research demonstrates how in certain models of parallel computation it is possible to combine efficiency and fault-tolerance. We show that in the models we study, it is possible to develop efficient parallel algorithms without concern for fault-tolerance, and then correctly and efficiently execute these algorithms on parallel machines whose processors are subject to arbitrary dynamic failstop errors. By ensuring efficient executions for any patterns of failures, the efficiency is also maintained when failures are infrequent, or when the expected number of failures is small.
- Research Organization:
- Brown Univ., Providence, RI (United States). Dept. of Computer Science
- OSTI ID:
- 7169968
- Report Number(s):
- AD-A-253350/3/XAB; CS--92-23; CNN: N00014-91-J-1613
- Country of Publication:
- United States
- Language:
- English
Similar Records
Interactive animation of fault-tolerant parallel algorithms
Algorithm-based fault tolerance on a hypercube multiprocessor
A fault-tolerant mapping scheme for a configurable multiprocessor system
Technical Report
·
Fri Jan 31 23:00:00 EST 1992
·
OSTI ID:6985119
Algorithm-based fault tolerance on a hypercube multiprocessor
Journal Article
·
Sat Sep 01 00:00:00 EDT 1990
· IEEE Transactions on Computers (Institute of Electrical and Electronics Engineers); (USA)
·
OSTI ID:6569965
A fault-tolerant mapping scheme for a configurable multiprocessor system
Journal Article
·
Tue Jan 31 23:00:00 EST 1989
· IEEE Trans. Comput.; (United States)
·
OSTI ID:6275763