Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Designing and reconfiguring fault-tolerant multiprocessor systems

Thesis/Dissertation ·
OSTI ID:7046530

This thesis presents a general theory for designing multiprocessor computer systems that can tolerate faulty processors. It is especially concerned with structural fault tolerance, defined as the ability to reconfigure around faults in order to preserve the interconnection structure of a multiprocessor. A major goal is to model some important practical design features not previously addressed, including applicability to any multiprocessor structure and any number of faults. Low hardware overhead and efficient reconfigurability are also important goals. The systems of interest and their faults are represented by graphs, and reconfiguration is modeled by graph-to-graph mappings that replace faulty structures by nonfaulty ones. Within this framework, two general design methodologies for fault tolerance are defined. The first approach called node covering performs reconfiguration by mapping a node (processor) to one of a specific subset of other nodes called its covers. The relation between nodes and their covers is represented efficiently by covering graphs. The authors show how to design k-fault-tolerant trees from their covering graphs. The resulting designs are near-optimal with respect to hardware cost. He also generalizes the node-covering approach to arbitrary multiprocessor graphs, and demonstrate that the resulting fault-tolerant designs have low-cost practical implementations. The second design theory uses graph automorphisms to represent the reconfiguration process. He demonstrates the efficacy of this theory by applying it to hypercube multiprocessors, and obtain fault-tolerant designs that are superior to those proposed in previous work. He also applies automorphisms to local sparing, which associates spare nodes with disjoint groups of processors to simplify reconfiguration.

Research Organization:
Michigan Univ., Ann Arbor, MI (United States)
OSTI ID:
7046530
Country of Publication:
United States
Language:
English

Similar Records

Hardware reconfiguration for fault-tolerant processor arrays
Thesis/Dissertation · Sat Dec 31 23:00:00 EST 1988 · OSTI ID:6037561

Fault tolerance in modular multiprocessor systems
Thesis/Dissertation · Mon Dec 31 23:00:00 EST 1990 · OSTI ID:5254206

The full-use-of-suitable-spares (FUSS) approach to hardware reconfiguration for fault-tolerant processor arrays
Journal Article · Sat Mar 31 23:00:00 EST 1990 · IEEE Transactions on Computers (Institute of Electrical and Electronics Engineers); (USA) · OSTI ID:6782267