Designing and reconfiguring fault-tolerant multiprocessor systems
This thesis presents a general theory for designing multiprocessor computer systems that can tolerate faulty processors. It is especially concerned with structural fault tolerance, defined as the ability to reconfigure around faults in order to preserve the interconnection structure of a multiprocessor. A major goal is to model some important practical design features not previously addressed, including applicability to any multiprocessor structure and any number of faults. Low hardware overhead and efficient reconfigurability are also important goals. The systems of interest and their faults are represented by graphs, and reconfiguration is modeled by graph-to-graph mappings that replace faulty structures by nonfaulty ones. Within this framework, two general design methodologies for fault tolerance are defined. The first approach called node covering performs reconfiguration by mapping a node (processor) to one of a specific subset of other nodes called its covers. The relation between nodes and their covers is represented efficiently by covering graphs. The authors show how to design k-fault-tolerant trees from their covering graphs. The resulting designs are near-optimal with respect to hardware cost. He also generalizes the node-covering approach to arbitrary multiprocessor graphs, and demonstrate that the resulting fault-tolerant designs have low-cost practical implementations. The second design theory uses graph automorphisms to represent the reconfiguration process. He demonstrates the efficacy of this theory by applying it to hypercube multiprocessors, and obtain fault-tolerant designs that are superior to those proposed in previous work. He also applies automorphisms to local sparing, which associates spare nodes with disjoint groups of processors to simplify reconfiguration.
- Research Organization:
- Michigan Univ., Ann Arbor, MI (United States)
- OSTI ID:
- 7046530
- Resource Relation:
- Other Information: Thesis (Ph.D.)
- Country of Publication:
- United States
- Language:
- English
Similar Records
FTN topology and protocols
Fault tolerance in modular multiprocessor systems