Aggregating job exit statuses of a plurality of compute nodes executing a parallel application
Aggregating job exit statuses of a plurality of compute nodes executing a parallel application, including: identifying a subset of compute nodes in the parallel computer to execute the parallel application; selecting one compute node in the subset of compute nodes in the parallel computer as a job leader compute node; initiating execution of the parallel application on the subset of compute nodes; receiving an exit status from each compute node in the subset of compute nodes, where the exit status for each compute node includes information describing execution of some portion of the parallel application by the compute node; aggregating each exit status from each compute node in the subset of compute nodes; and sending an aggregated exit status for the subset of compute nodes in the parallel computer.
- Research Organization:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- B579040
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Number(s):
- 9,086,962
- Application Number:
- 13/524,602
- OSTI ID:
- 1195933
- Resource Relation:
- Patent File Date: 2012 Jun 15
- Country of Publication:
- United States
- Language:
- English
Management system and method for parallel computer system
|
patent | August 1999 |
Multiprogrammed multiprocessor system with lobally controlled communication and signature controlled scheduling
|
patent | July 2004 |
Ultrascalable Petaflop Parallel Supercomputer
|
patent-application | January 2009 |
Similar Records
Scheduling applications for execution on a plurality of compute nodes of a parallel computer to manage temperature of the nodes during execution
Distributing an executable job load file to compute nodes in a parallel computer