DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Link failure detection in a parallel computer

Abstract

Methods, apparatus, and products are disclosed for link failure detection in a parallel computer including compute nodes connected in a rectangular mesh network, each pair of adjacent compute nodes in the rectangular mesh network connected together using a pair of links, that includes: assigning each compute node to either a first group or a second group such that adjacent compute nodes in the rectangular mesh network are assigned to different groups; sending, by each of the compute nodes assigned to the first group, a first test message to each adjacent compute node assigned to the second group; determining, by each of the compute nodes assigned to the second group, whether the first test message was received from each adjacent compute node assigned to the first group; and notifying a user, by each of the compute nodes assigned to the second group, whether the first test message was received.

Inventors:
 [1];  [1];  [1];  [1]
  1. Rochester, MN
Issue Date:
Research Org.:
International Business Machines Corp., Armonk, NY (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1017450
Patent Number(s):
7831866
Application Number:
11/832,940
Assignee:
International Business Machines Corporation (Armonk, NY)
Patent Classifications (CPCs):
H - ELECTRICITY H04 - ELECTRIC COMMUNICATION TECHNIQUE H04L - TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
DOE Contract Number:  
B554331
Resource Type:
Patent
Country of Publication:
United States
Language:
English

Citation Formats

Archer, Charles J, Blocksome, Michael A, Megerian, Mark G, and Smith, Brian E. Link failure detection in a parallel computer. United States: N. p., 2010. Web.
Archer, Charles J, Blocksome, Michael A, Megerian, Mark G, & Smith, Brian E. Link failure detection in a parallel computer. United States.
Archer, Charles J, Blocksome, Michael A, Megerian, Mark G, and Smith, Brian E. Tue . "Link failure detection in a parallel computer". United States. https://www.osti.gov/servlets/purl/1017450.
@article{osti_1017450,
title = {Link failure detection in a parallel computer},
author = {Archer, Charles J and Blocksome, Michael A and Megerian, Mark G and Smith, Brian E},
abstractNote = {Methods, apparatus, and products are disclosed for link failure detection in a parallel computer including compute nodes connected in a rectangular mesh network, each pair of adjacent compute nodes in the rectangular mesh network connected together using a pair of links, that includes: assigning each compute node to either a first group or a second group such that adjacent compute nodes in the rectangular mesh network are assigned to different groups; sending, by each of the compute nodes assigned to the first group, a first test message to each adjacent compute node assigned to the second group; determining, by each of the compute nodes assigned to the second group, whether the first test message was received from each adjacent compute node assigned to the first group; and notifying a user, by each of the compute nodes assigned to the second group, whether the first test message was received.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Nov 09 00:00:00 EST 2010},
month = {Tue Nov 09 00:00:00 EST 2010}
}

Works referenced in this record:

Other
journal, December 1955