skip to main content
DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Link failure detection in a parallel computer

Abstract

Methods, apparatus, and products are disclosed for link failure detection in a parallel computer including compute nodes connected in a rectangular mesh network, each pair of adjacent compute nodes in the rectangular mesh network connected together using a pair of links, that includes: assigning each compute node to either a first group or a second group such that adjacent compute nodes in the rectangular mesh network are assigned to different groups; sending, by each of the compute nodes assigned to the first group, a first test message to each adjacent compute node assigned to the second group; determining, by each of the compute nodes assigned to the second group, whether the first test message was received from each adjacent compute node assigned to the first group; and notifying a user, by each of the compute nodes assigned to the second group, whether the first test message was received.

Inventors:
 [1];  [1];  [1];  [1]
  1. (Rochester, MN)
Issue Date:
Research Org.:
International Business Machines Corporation (Armonk, NY)
Sponsoring Org.:
USDOE
OSTI Identifier:
1017450
Patent Number(s):
7,831,866
Application Number:
11/832,940
Assignee:
International Business Machines Corporation (Armonk, NY) OSTI
DOE Contract Number:  
B554331
Resource Type:
Patent
Country of Publication:
United States
Language:
English

Citation Formats

Archer, Charles J., Blocksome, Michael A., Megerian, Mark G., and Smith, Brian E. Link failure detection in a parallel computer. United States: N. p., 2010. Web.
Archer, Charles J., Blocksome, Michael A., Megerian, Mark G., & Smith, Brian E. Link failure detection in a parallel computer. United States.
Archer, Charles J., Blocksome, Michael A., Megerian, Mark G., and Smith, Brian E. Tue . "Link failure detection in a parallel computer". United States. https://www.osti.gov/servlets/purl/1017450.
@article{osti_1017450,
title = {Link failure detection in a parallel computer},
author = {Archer, Charles J. and Blocksome, Michael A. and Megerian, Mark G. and Smith, Brian E.},
abstractNote = {Methods, apparatus, and products are disclosed for link failure detection in a parallel computer including compute nodes connected in a rectangular mesh network, each pair of adjacent compute nodes in the rectangular mesh network connected together using a pair of links, that includes: assigning each compute node to either a first group or a second group such that adjacent compute nodes in the rectangular mesh network are assigned to different groups; sending, by each of the compute nodes assigned to the first group, a first test message to each adjacent compute node assigned to the second group; determining, by each of the compute nodes assigned to the second group, whether the first test message was received from each adjacent compute node assigned to the first group; and notifying a user, by each of the compute nodes assigned to the second group, whether the first test message was received.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2010},
month = {11}
}

Patent:

Save / Share: