DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Locating hardware faults in a parallel computer

Abstract

Locating hardware faults in a parallel computer, including defining within a tree network of the parallel computer two or more sets of non-overlapping test levels of compute nodes of the network that together include all the data communications links of the network, each non-overlapping test level comprising two or more adjacent tiers of the tree; defining test cells within each non-overlapping test level, each test cell comprising a subtree of the tree including a subtree root compute node and all descendant compute nodes of the subtree root compute node within a non-overlapping test level; performing, separately on each set of non-overlapping test levels, an uplink test on all test cells in a set of non-overlapping test levels; and performing, separately from the uplink tests and separately on each set of non-overlapping test levels, a downlink test on all test cells in a set of non-overlapping test levels.

Inventors:
; ; ;
Issue Date:
Research Org.:
International Business Machines Corp., Armonk, NY (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1176237
Patent Number(s):
7697443
Application Number:
11/279,592
Assignee:
International Business Machines Corporation (Armonk, NY)
Patent Classifications (CPCs):
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
H - ELECTRICITY H04 - ELECTRIC COMMUNICATION TECHNIQUE H04L - TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
Resource Type:
Patent
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Archer, Charles J., Megerian, Mark G., Ratterman, Joseph D., and Smith, Brian E. Locating hardware faults in a parallel computer. United States: N. p., 2010. Web.
Archer, Charles J., Megerian, Mark G., Ratterman, Joseph D., & Smith, Brian E. Locating hardware faults in a parallel computer. United States.
Archer, Charles J., Megerian, Mark G., Ratterman, Joseph D., and Smith, Brian E. Tue . "Locating hardware faults in a parallel computer". United States. https://www.osti.gov/servlets/purl/1176237.
@article{osti_1176237,
title = {Locating hardware faults in a parallel computer},
author = {Archer, Charles J. and Megerian, Mark G. and Ratterman, Joseph D. and Smith, Brian E.},
abstractNote = {Locating hardware faults in a parallel computer, including defining within a tree network of the parallel computer two or more sets of non-overlapping test levels of compute nodes of the network that together include all the data communications links of the network, each non-overlapping test level comprising two or more adjacent tiers of the tree; defining test cells within each non-overlapping test level, each test cell comprising a subtree of the tree including a subtree root compute node and all descendant compute nodes of the subtree root compute node within a non-overlapping test level; performing, separately on each set of non-overlapping test levels, an uplink test on all test cells in a set of non-overlapping test levels; and performing, separately from the uplink tests and separately on each set of non-overlapping test levels, a downlink test on all test cells in a set of non-overlapping test levels.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Apr 13 00:00:00 EDT 2010},
month = {Tue Apr 13 00:00:00 EDT 2010}
}