Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Design and analysis of dynamic redundancy networks

Journal Article · · IEEE Trans. Comput.; (United States)
DOI:https://doi.org/10.1109/12.2253· OSTI ID:6337943
Most previous work in the fault-tolerant design of multistage interconnection networks (MIN's) has been based on improving the realiabilities of the networks themselves. For parallel systems containing a large number of processing elements (PE's), the capability to recover from a PE fault is also important. The dynamic redundancy (DR) network is investigated in this paper. It can tolerate faults in the network and support a system to tolerate PE faults without degradation by adding spare PE's, while retaining the full capability of a multistage cube network. The DR network can also be controlled by the same routing tags used for the multistage cube. Hence, with a recovery procedure added in the operating system, programs which can be executed in a system based on a multistage cube can be executed in a system based on the proposed network before and after a fault without any modification. A variation of the DR network the reduced DR network, is also considered, which can be implemented more cost effectively than the DR while retaining most of the advantages of the DR. The realiabilities of DR-based systems with one spare PE and the reliabilities of systems with no spare PE's are estimated and compared, and the effect of adding multiple spare PE's is analysed. It is shown that no matter how much redundancy is added into an MIN, the system reliability cannot exceed a certain bound; however, using the DR and spare PE's, this bound can be exceeded.
Research Organization:
Dept. of Computer Science, Univ. of Houston, Houston, TX (US)
OSTI ID:
6337943
Journal Information:
IEEE Trans. Comput.; (United States), Journal Name: IEEE Trans. Comput.; (United States) Vol. 37:9; ISSN ITCOB
Country of Publication:
United States
Language:
English

Similar Records

Fault tolerance and dynamic partitioning in large-scale parallel systems
Thesis/Dissertation · Wed Dec 31 23:00:00 EST 1986 · OSTI ID:5533826

Fault tolerance in multistage interconnection network-based multicomputer systems
Thesis/Dissertation · Wed Dec 31 23:00:00 EST 1986 · OSTI ID:5705671

Fault tolerance capabilities in multistage network-based multicomputer systems
Journal Article · Fri Jul 01 00:00:00 EDT 1988 · IEEE Trans. Comput.; (United States) · OSTI ID:6992962