Design and analysis of dynamic redundancy networks
Journal Article
·
· IEEE Trans. Comput.; (United States)
Most previous work in the fault-tolerant design of multistage interconnection networks (MIN's) has been based on improving the realiabilities of the networks themselves. For parallel systems containing a large number of processing elements (PE's), the capability to recover from a PE fault is also important. The dynamic redundancy (DR) network is investigated in this paper. It can tolerate faults in the network and support a system to tolerate PE faults without degradation by adding spare PE's, while retaining the full capability of a multistage cube network. The DR network can also be controlled by the same routing tags used for the multistage cube. Hence, with a recovery procedure added in the operating system, programs which can be executed in a system based on a multistage cube can be executed in a system based on the proposed network before and after a fault without any modification. A variation of the DR network the reduced DR network, is also considered, which can be implemented more cost effectively than the DR while retaining most of the advantages of the DR. The realiabilities of DR-based systems with one spare PE and the reliabilities of systems with no spare PE's are estimated and compared, and the effect of adding multiple spare PE's is analysed. It is shown that no matter how much redundancy is added into an MIN, the system reliability cannot exceed a certain bound; however, using the DR and spare PE's, this bound can be exceeded.
- Research Organization:
- Dept. of Computer Science, Univ. of Houston, Houston, TX (US)
- OSTI ID:
- 6337943
- Journal Information:
- IEEE Trans. Comput.; (United States), Journal Name: IEEE Trans. Comput.; (United States) Vol. 37:9; ISSN ITCOB
- Country of Publication:
- United States
- Language:
- English
Similar Records
Fault tolerance and dynamic partitioning in large-scale parallel systems
Fault tolerance in multistage interconnection network-based multicomputer systems
Fault tolerance capabilities in multistage network-based multicomputer systems
Thesis/Dissertation
·
Wed Dec 31 23:00:00 EST 1986
·
OSTI ID:5533826
Fault tolerance in multistage interconnection network-based multicomputer systems
Thesis/Dissertation
·
Wed Dec 31 23:00:00 EST 1986
·
OSTI ID:5705671
Fault tolerance capabilities in multistage network-based multicomputer systems
Journal Article
·
Fri Jul 01 00:00:00 EDT 1988
· IEEE Trans. Comput.; (United States)
·
OSTI ID:6992962