skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Designing and reconfiguring fault-tolerant multiprocessor systems

Miscellaneous ·
OSTI ID:7046530

This thesis presents a general theory for designing multiprocessor computer systems that can tolerate faulty processors. It is especially concerned with structural fault tolerance, defined as the ability to reconfigure around faults in order to preserve the interconnection structure of a multiprocessor. A major goal is to model some important practical design features not previously addressed, including applicability to any multiprocessor structure and any number of faults. Low hardware overhead and efficient reconfigurability are also important goals. The systems of interest and their faults are represented by graphs, and reconfiguration is modeled by graph-to-graph mappings that replace faulty structures by nonfaulty ones. Within this framework, two general design methodologies for fault tolerance are defined. The first approach called node covering performs reconfiguration by mapping a node (processor) to one of a specific subset of other nodes called its covers. The relation between nodes and their covers is represented efficiently by covering graphs. The authors show how to design k-fault-tolerant trees from their covering graphs. The resulting designs are near-optimal with respect to hardware cost. He also generalizes the node-covering approach to arbitrary multiprocessor graphs, and demonstrate that the resulting fault-tolerant designs have low-cost practical implementations. The second design theory uses graph automorphisms to represent the reconfiguration process. He demonstrates the efficacy of this theory by applying it to hypercube multiprocessors, and obtain fault-tolerant designs that are superior to those proposed in previous work. He also applies automorphisms to local sparing, which associates spare nodes with disjoint groups of processors to simplify reconfiguration.

Research Organization:
Michigan Univ., Ann Arbor, MI (United States)
OSTI ID:
7046530
Resource Relation:
Other Information: Thesis (Ph.D.)
Country of Publication:
United States
Language:
English

Similar Records

Hardware reconfiguration for fault-tolerant processor arrays
Miscellaneous · Sun Jan 01 00:00:00 EST 1989 · OSTI ID:7046530

FTN topology and protocols
Journal Article · Tue Jan 01 00:00:00 EST 1991 · Journal of Parallel and Distributed Computing; (United States) · OSTI ID:7046530

Fault tolerance in modular multiprocessor systems
Miscellaneous · Tue Jan 01 00:00:00 EST 1991 · OSTI ID:7046530