DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Locating hardware faults in a data communications network of a parallel computer

Abstract

Hardware faults location in a data communications network of a parallel computer. Such a parallel computer includes a plurality of compute nodes and a data communications network that couples the compute nodes for data communications and organizes the compute node as a tree. Locating hardware faults includes identifying a next compute node as a parent node and a root of a parent test tree, identifying for each child compute node of the parent node a child test tree having the child compute node as root, running a same test suite on the parent test tree and each child test tree, and identifying the parent compute node as having a defective link connected from the parent compute node to a child compute node if the test suite fails on the parent test tree and succeeds on all the child test trees.

Inventors:
 [1];  [1];  [1];  [1]
  1. Rochester, MN
Issue Date:
Research Org.:
International Business Machines Corp., Armonk, NY (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1015187
Patent Number(s):
7646721
Application Number:
11/279,586
Assignee:
International Business Machines Corporation (Armonk, NY)
Patent Classifications (CPCs):
H - ELECTRICITY H04 - ELECTRIC COMMUNICATION TECHNIQUE H04L - TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
DOE Contract Number:  
B519700
Resource Type:
Patent
Country of Publication:
United States
Language:
English

Citation Formats

Archer, Charles J, Megerian, Mark G, Ratterman, Joseph D, and Smith, Brian E. Locating hardware faults in a data communications network of a parallel computer. United States: N. p., 2010. Web.
Archer, Charles J, Megerian, Mark G, Ratterman, Joseph D, & Smith, Brian E. Locating hardware faults in a data communications network of a parallel computer. United States.
Archer, Charles J, Megerian, Mark G, Ratterman, Joseph D, and Smith, Brian E. Tue . "Locating hardware faults in a data communications network of a parallel computer". United States. https://www.osti.gov/servlets/purl/1015187.
@article{osti_1015187,
title = {Locating hardware faults in a data communications network of a parallel computer},
author = {Archer, Charles J and Megerian, Mark G and Ratterman, Joseph D and Smith, Brian E},
abstractNote = {Hardware faults location in a data communications network of a parallel computer. Such a parallel computer includes a plurality of compute nodes and a data communications network that couples the compute nodes for data communications and organizes the compute node as a tree. Locating hardware faults includes identifying a next compute node as a parent node and a root of a parent test tree, identifying for each child compute node of the parent node a child test tree having the child compute node as root, running a same test suite on the parent test tree and each child test tree, and identifying the parent compute node as having a defective link connected from the parent compute node to a child compute node if the test suite fails on the parent test tree and succeeds on all the child test trees.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2010},
month = {1}
}