skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Design and analysis of dynamic redundancy networks

Journal Article · · IEEE Trans. Comput.; (United States)
DOI:https://doi.org/10.1109/12.2253· OSTI ID:6337943

Most previous work in the fault-tolerant design of multistage interconnection networks (MIN's) has been based on improving the realiabilities of the networks themselves. For parallel systems containing a large number of processing elements (PE's), the capability to recover from a PE fault is also important. The dynamic redundancy (DR) network is investigated in this paper. It can tolerate faults in the network and support a system to tolerate PE faults without degradation by adding spare PE's, while retaining the full capability of a multistage cube network. The DR network can also be controlled by the same routing tags used for the multistage cube. Hence, with a recovery procedure added in the operating system, programs which can be executed in a system based on a multistage cube can be executed in a system based on the proposed network before and after a fault without any modification. A variation of the DR network the reduced DR network, is also considered, which can be implemented more cost effectively than the DR while retaining most of the advantages of the DR. The realiabilities of DR-based systems with one spare PE and the reliabilities of systems with no spare PE's are estimated and compared, and the effect of adding multiple spare PE's is analysed. It is shown that no matter how much redundancy is added into an MIN, the system reliability cannot exceed a certain bound; however, using the DR and spare PE's, this bound can be exceeded.

Research Organization:
Dept. of Computer Science, Univ. of Houston, Houston, TX (US)
OSTI ID:
6337943
Journal Information:
IEEE Trans. Comput.; (United States), Vol. 37:9
Country of Publication:
United States
Language:
English