Locating hardware faults in a parallel computer
Abstract
Locating hardware faults in a parallel computer, including defining within a tree network of the parallel computer two or more sets of non-overlapping test levels of compute nodes of the network that together include all the data communications links of the network, each non-overlapping test level comprising two or more adjacent tiers of the tree; defining test cells within each non-overlapping test level, each test cell comprising a subtree of the tree including a subtree root compute node and all descendant compute nodes of the subtree root compute node within a non-overlapping test level; performing, separately on each set of non-overlapping test levels, an uplink test on all test cells in a set of non-overlapping test levels; and performing, separately from the uplink tests and separately on each set of non-overlapping test levels, a downlink test on all test cells in a set of non-overlapping test levels.
- Inventors:
- Issue Date:
- Research Org.:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1176237
- Patent Number(s):
- 7697443
- Application Number:
- 11/279,592
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
H - ELECTRICITY H04 - ELECTRIC COMMUNICATION TECHNIQUE H04L - TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- Resource Type:
- Patent
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Archer, Charles J., Megerian, Mark G., Ratterman, Joseph D., and Smith, Brian E. Locating hardware faults in a parallel computer. United States: N. p., 2010.
Web.
Archer, Charles J., Megerian, Mark G., Ratterman, Joseph D., & Smith, Brian E. Locating hardware faults in a parallel computer. United States.
Archer, Charles J., Megerian, Mark G., Ratterman, Joseph D., and Smith, Brian E. Tue .
"Locating hardware faults in a parallel computer". United States. https://www.osti.gov/servlets/purl/1176237.
@article{osti_1176237,
title = {Locating hardware faults in a parallel computer},
author = {Archer, Charles J. and Megerian, Mark G. and Ratterman, Joseph D. and Smith, Brian E.},
abstractNote = {Locating hardware faults in a parallel computer, including defining within a tree network of the parallel computer two or more sets of non-overlapping test levels of compute nodes of the network that together include all the data communications links of the network, each non-overlapping test level comprising two or more adjacent tiers of the tree; defining test cells within each non-overlapping test level, each test cell comprising a subtree of the tree including a subtree root compute node and all descendant compute nodes of the subtree root compute node within a non-overlapping test level; performing, separately on each set of non-overlapping test levels, an uplink test on all test cells in a set of non-overlapping test levels; and performing, separately from the uplink tests and separately on each set of non-overlapping test levels, a downlink test on all test cells in a set of non-overlapping test levels.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2010},
month = {4}
}