Dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job
Patent
·
OSTI ID:1039560
- Rochester, MN
- Byron, MN
Methods, systems, and products for dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job that include: identifying that a job failed to execute on the block of compute nodes because connectivity failed between a compute node assigned as at least one of the connected nodes for the block of compute nodes and its supporting I/O node; and re-launching the job, including selecting an alternative connected node that is actively coupled for data communications with an active I/O node; and assigning the alternative connected node as the connected node for the block of compute nodes running the re-launched job.
- Research Organization:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- B554331
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Number(s):
- 8,140,889
- Application Number:
- US patent applicaiton 12/861,426
- OSTI ID:
- 1039560
- Country of Publication:
- United States
- Language:
- English
Similar Records
Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)
Distributing an executable job load file to compute nodes in a parallel computer
Distributing an executable job load file to compute nodes in a parallel computer
Technical Report
·
Fri Nov 29 00:00:00 EST 2019
·
OSTI ID:1039560
Distributing an executable job load file to compute nodes in a parallel computer
Patent
·
Tue Aug 09 00:00:00 EDT 2016
·
OSTI ID:1039560
Distributing an executable job load file to compute nodes in a parallel computer
Patent
·
Tue Sep 13 00:00:00 EDT 2016
·
OSTI ID:1039560