Locating hardware faults in a data communications network of a parallel computer
Abstract
Hardware faults location in a data communications network of a parallel computer. Such a parallel computer includes a plurality of compute nodes and a data communications network that couples the compute nodes for data communications and organizes the compute node as a tree. Locating hardware faults includes identifying a next compute node as a parent node and a root of a parent test tree, identifying for each child compute node of the parent node a child test tree having the child compute node as root, running a same test suite on the parent test tree and each child test tree, and identifying the parent compute node as having a defective link connected from the parent compute node to a child compute node if the test suite fails on the parent test tree and succeeds on all the child test trees.
- Inventors:
-
- Rochester, MN
- Issue Date:
- Research Org.:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1015187
- Patent Number(s):
- 7646721
- Application Number:
- 11/279,586
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Classifications (CPCs):
-
H - ELECTRICITY H04 - ELECTRIC COMMUNICATION TECHNIQUE H04L - TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- DOE Contract Number:
- B519700
- Resource Type:
- Patent
- Country of Publication:
- United States
- Language:
- English
Citation Formats
Archer, Charles J, Megerian, Mark G, Ratterman, Joseph D, and Smith, Brian E. Locating hardware faults in a data communications network of a parallel computer. United States: N. p., 2010.
Web.
Archer, Charles J, Megerian, Mark G, Ratterman, Joseph D, & Smith, Brian E. Locating hardware faults in a data communications network of a parallel computer. United States.
Archer, Charles J, Megerian, Mark G, Ratterman, Joseph D, and Smith, Brian E. Tue .
"Locating hardware faults in a data communications network of a parallel computer". United States. https://www.osti.gov/servlets/purl/1015187.
@article{osti_1015187,
title = {Locating hardware faults in a data communications network of a parallel computer},
author = {Archer, Charles J and Megerian, Mark G and Ratterman, Joseph D and Smith, Brian E},
abstractNote = {Hardware faults location in a data communications network of a parallel computer. Such a parallel computer includes a plurality of compute nodes and a data communications network that couples the compute nodes for data communications and organizes the compute node as a tree. Locating hardware faults includes identifying a next compute node as a parent node and a root of a parent test tree, identifying for each child compute node of the parent node a child test tree having the child compute node as root, running a same test suite on the parent test tree and each child test tree, and identifying the parent compute node as having a defective link connected from the parent compute node to a child compute node if the test suite fails on the parent test tree and succeeds on all the child test trees.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2010},
month = {1}
}