Fault tolerant massively parallel processing architecture
This paper presents two massively parallel processing architectures suitable for solving a wide variety of algorithms of divide-and-conquer type for problems such as the discrete Fourier transform, production systems, design automation, and others. The first architecture, called the Chain-structured Butterfly ARchitecture (CBAR), consists of a two-dimensional array of N-L . (log/sub 2/(L)+1) processing elements (PE) organized as L levels of log/sub 2/(L)+1 stages, and which has the butterfly connection between PEs in consecutive stages with straight-through feedback between PEs in the last and first stages. This connection system has the desirable property of allowing thousands of PEs to be connected with O(N) connection cost, O(log/sub 2/(N/log/sub 2/N)) communication paths, and a small number (=4) of I/O ports per PE. However, this architecture is not fault tolerant. The authors, therefore, propose a second architecture, called the REconfigurable Chain-structured Butterfly ARchitecture (RECBAR), which is a modified version of the CBAR. The RECBAR possesses all the desirable features of the CBAR, with the number of I/O ports per PE increased to six, and uses O(log/sub 2/N)/N) overhead in PEs and approximately 50% overhead in links to achieve single-level fault tolerance. Reliability improvements of the RECBAR over the CBAR are studied. This paper also presents a distributed diagnostic and structuring algorithm for the RECBAR that enables the architecture to detect faults and structure itself accordingly within 2 . log/sub 2/(L)+1 time steps, thus making it a truly fault tolerant architecture.
- Research Organization:
- Computer Systems Group, Coordinated Science Lab., Univ. of Illinois, Urbana, IL (US)
- OSTI ID:
- 7106966
- Journal Information:
- J. Parallel Distrib. Comput.; (United States), Vol. 4:4
- Country of Publication:
- United States
- Language:
- English
Similar Records
An architecture for a wafer-scale-implemented MIMD parallel computer
Parallel processing of production systems: an integrated software and hardware approach
Related Subjects
ARRAY PROCESSORS
COMPUTER ARCHITECTURE
FOURIER TRANSFORMATION
INTEGRATED CIRCUITS
FAULT TOLERANT COMPUTERS
RELIABILITY
COMPUTERS
DIGITAL COMPUTERS
ELECTRONIC CIRCUITS
INTEGRAL TRANSFORMATIONS
MICROELECTRONIC CIRCUITS
TRANSFORMATIONS
990210* - Supercomputers- (1987-1989)