Fault tolerance in modular multiprocessor systems
Design methodologies for fault tolerant modular microprocessor systems are presented. The design of such systems involves designing a spare allocation strategy and designing a reconfiguration algorithm that can use different spare allocation strategies. An architecture independent spare allocation strategy that maximizes the system reliability for some given hardware constraints is proposed. Also based on different assumptions, three reconfiguration approaches that can be applied to any spare allocation strategy are proposed. The first approach uses hardware switches to achieve fault tolerance. In this approach, a module controller initiates the reconfiguration whenever a fault occurs, and thus the reconfiguration is not distributed. In the second approach, reconfiguration is initiated by the spare node that replaces a faulty node. The reconfiguration is performed distributively and is achieved by multiplexing several channels onto one link. The above two approaches are suitable for systems where performance degradation is not desired. If performance degradation is allowed, then a third approach that uses fault tolerant routing can be used. This approach uses a two-phase fault tolerant routing algorithm to route messages to its destination in the presence of faults.
- Research Organization:
- Pittsburgh Univ., PA (United States)
- OSTI ID:
- 5254206
- Country of Publication:
- United States
- Language:
- English
Similar Records
Designing and reconfiguring fault-tolerant multiprocessor systems
Hardware reconfiguration for fault-tolerant processor arrays