skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Fault-tolerance considerations in large, multiple-processor systems

Journal Article · · Computer; (United States)

Researcher have long conjectured upon the possibility of constructing large, massively-parallel computing engines by interconnecting many conventional processing elements to form an integrated supersystem. The rapid expansion in very large scale integration, or VLSI, circuit technology during the past decade has accelerated research in this direction. As advances in VLSI push basic component or chip functionalities to the processor level and beyond, it becomes natural to view complex processing elements as the basic components of much larger systems. Several names for such systems have been proposed, including network computers, multicomputers, and distributed multiprocessors. Despite the naming differences, these systems have the following salient features: (1) A large number of basically autonomous processing elements interconnected by a structure that allows high-bandwidth communication between them. At the system level, these processing elements and interconnection facilities are viewed as the basic components of the system. Each processing node has its own local memory and there is no sharing of memory between nodes. (2) A high degree of distribution of control or operating system functions among the processing elements. (3) Highly parallel computation performed by constructing applications as collections of several or many distinct tasks. These tasks may execute concurrently on different processors, with necessary intertask communication carried out over the communication facilities linking the nodes. The collection of cooperating tasks comprising an application is sometimes referred to as a task force.

Research Organization:
Univ. of Iowa
OSTI ID:
6007556
Journal Information:
Computer; (United States), Vol. 19:3
Country of Publication:
United States
Language:
English