Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Fault tolerance in multistage interconnection network-based multicomputer systems

Thesis/Dissertation ·
OSTI ID:5705671

Multistage interconnection networks are one of the very few cost-effective alternatives for connecting a large number of computers for parallel execution. Due to the high complexity of multicomputer systems, faults are bound to occur in such systems. Fault tolerance is crucial for retaining high availability of the systems and assuring their efficient operation. When fault tolerance is incorporated into the system, faults do not force its complete shut-off and the system is able to continue its successful operation in their presence perhaps with some, graceful, degradation in its performance. This research explores fault tolerance in multistage interconnection networks-based multicomputer systems. Using a graph theoretic approach but relating to practical systems, it studies the impact of faults and systematically by means of a specially introduced methodical terminology, analyzes the inherent fault tolerance capabilities of such systems, mainly the systems based on non redundant multistage interconnection networks. Recovery schemes utilizing these capabilities, which enable the systems to operate successfully in the presence of faults, are proposed. These schemes involve real-time fault detection and location and on-line recovery via reconfiguration.

Research Organization:
Texas Univ., Austin (USA)
OSTI ID:
5705671
Country of Publication:
United States
Language:
English