skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Methods and apparatus using commutative error detection values for fault isolation in multiple node computers

Patent ·
OSTI ID:983062
 [1];  [2];  [3];  [4];  [5];  [6];  [7];  [8];  [9];  [10];  [11];  [12]
  1. Ardsley, NY
  2. Ridgefield, CT
  3. Croton-On-Hudson, NY
  4. Yorktown, NY
  5. Mount Kisco, NY
  6. Irvington, NY
  7. Cortlandt Manor, NY
  8. Ossining, NY
  9. Mississauga, CA
  10. Wernau, DE
  11. Brewster, NY
  12. Bedford Hills, NY

Methods and apparatus perform fault isolation in multiple node computing systems using commutative error detection values for--example, checksums--to identify and to isolate faulty nodes. When information associated with a reproducible portion of a computer program is injected into a network by a node, a commutative error detection value is calculated. At intervals, node fault detection apparatus associated with the multiple node computer system retrieve commutative error detection values associated with the node and stores them in memory. When the computer program is executed again by the multiple node computer system, new commutative error detection values are created and stored in memory. The node fault detection apparatus identifies faulty nodes by comparing commutative error detection values associated with reproducible portions of the application program generated by a particular node from different runs of the application program. Differences in values indicate a possible faulty node.

Research Organization:
International Business Machines Corp., Armonk, NY (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
W-7405-ENG-48
Assignee:
International Business Machines Corporation (Armonk, NY)
Patent Number(s):
7,383,490
Application Number:
11/106,069
OSTI ID:
983062
Country of Publication:
United States
Language:
English

Similar Records

A top-down approach to high-consequence fault analysis for software systems
Conference · Tue Apr 01 00:00:00 EST 1997 · OSTI ID:983062

Compiler-Assisted Detection of Transient Memory Errors
Conference · Mon Jun 09 00:00:00 EDT 2014 · OSTI ID:983062

Fault tolerance for VLSI multicomputers
Thesis/Dissertation · Tue Jan 01 00:00:00 EST 1985 · OSTI ID:983062

Related Subjects