Locating hardware faults in a data communications network of a parallel computer
- Rochester, MN
Hardware faults location in a data communications network of a parallel computer. Such a parallel computer includes a plurality of compute nodes and a data communications network that couples the compute nodes for data communications and organizes the compute node as a tree. Locating hardware faults includes identifying a next compute node as a parent node and a root of a parent test tree, identifying for each child compute node of the parent node a child test tree having the child compute node as root, running a same test suite on the parent test tree and each child test tree, and identifying the parent compute node as having a defective link connected from the parent compute node to a child compute node if the test suite fails on the parent test tree and succeeds on all the child test trees.
- Research Organization:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- B519700
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Number(s):
- 7,646,721
- Application Number:
- 11/279,586
- OSTI ID:
- 1015187
- Country of Publication:
- United States
- Language:
- English
Similar Records
Computer hardware fault administration
Distributed computing for signal processing: modeling of asynchronous parallel computation. Appendix C. Fault-tolerant interconnection networks and image-processing applications for the PASM parallel processing systems. Final report