Method and apparatus for analyzing error conditions in a massively parallel computer system by identifying anomalous nodes within a communicator set
Patent
·
OSTI ID:1018213
- Rochester, MN
An analytical mechanism for a massively parallel computer system automatically analyzes data retrieved from the system, and identifies nodes which exhibit anomalous behavior in comparison to their immediate neighbors. Preferably, anomalous behavior is determined by comparing call-return stack tracebacks for each node, grouping like nodes together, and identifying neighboring nodes which do not themselves belong to the group. A node, not itself in the group, having a large number of neighbors in the group, is a likely locality of error. The analyzer preferably presents this information to the user by sorting the neighbors according to number of adjoining members of the group.
- Research Organization:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- B591700
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Number(s):
- 7,930,595
- Application Number:
- 11/425,773
- OSTI ID:
- 1018213
- Country of Publication:
- United States
- Language:
- English
Similar Records
Method and apparatus for obtaining stack traceback data for multiple computing nodes of a massively parallel computer system
Providing nearest neighbor point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer
Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by routing through transporter nodes
Patent
·
Tue Mar 02 00:00:00 EST 2010
·
OSTI ID:1018213
Providing nearest neighbor point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer
Patent
·
Tue Oct 23 00:00:00 EDT 2012
·
OSTI ID:1018213
+1 more
Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by routing through transporter nodes
Patent
·
Tue Nov 16 00:00:00 EST 2010
·
OSTI ID:1018213
+3 more