Fault tolerance in multistage interconnection network-based multicomputer systems
Multistage interconnection networks are one of the very few cost-effective alternatives for connecting a large number of computers for parallel execution. Due to the high complexity of multicomputer systems, faults are bound to occur in such systems. Fault tolerance is crucial for retaining high availability of the systems and assuring their efficient operation. When fault tolerance is incorporated into the system, faults do not force its complete shut-off and the system is able to continue its successful operation in their presence perhaps with some, graceful, degradation in its performance. This research explores fault tolerance in multistage interconnection networks-based multicomputer systems. Using a graph theoretic approach but relating to practical systems, it studies the impact of faults and systematically by means of a specially introduced methodical terminology, analyzes the inherent fault tolerance capabilities of such systems, mainly the systems based on non redundant multistage interconnection networks. Recovery schemes utilizing these capabilities, which enable the systems to operate successfully in the presence of faults, are proposed. These schemes involve real-time fault detection and location and on-line recovery via reconfiguration.
- Research Organization:
- Texas Univ., Austin (USA)
- OSTI ID:
- 5705671
- Country of Publication:
- United States
- Language:
- English
Similar Records
Fault tolerant capabilities of redundant multistage interconnection networks (multiprocessors)
Fault tolerance for VLSI multicomputers