Dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job
Abstract
Methods, systems, and products for dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job that include: identifying that a job failed to execute on the block of compute nodes because connectivity failed between a compute node assigned as at least one of the connected nodes for the block of compute nodes and its supporting I/O node; and re-launching the job, including selecting an alternative connected node that is actively coupled for data communications with an active I/O node; and assigning the alternative connected node as the connected node for the block of compute nodes running the re-launched job.
- Inventors:
-
- Rochester, MN
- Byron, MN
- Issue Date:
- Research Org.:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1039560
- Patent Number(s):
- 8140889
- Application Number:
- US patent applicaiton 12/861,426
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- B554331
- Resource Type:
- Patent
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Budnik, Thomas A, Knudson, Brant L, Megerian, Mark G, Miller, Samuel J, and Stockdell, William M. Dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job. United States: N. p., 2012.
Web.
Budnik, Thomas A, Knudson, Brant L, Megerian, Mark G, Miller, Samuel J, & Stockdell, William M. Dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job. United States.
Budnik, Thomas A, Knudson, Brant L, Megerian, Mark G, Miller, Samuel J, and Stockdell, William M. Tue .
"Dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job". United States. https://www.osti.gov/servlets/purl/1039560.
@article{osti_1039560,
title = {Dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job},
author = {Budnik, Thomas A and Knudson, Brant L and Megerian, Mark G and Miller, Samuel J and Stockdell, William M},
abstractNote = {Methods, systems, and products for dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job that include: identifying that a job failed to execute on the block of compute nodes because connectivity failed between a compute node assigned as at least one of the connected nodes for the block of compute nodes and its supporting I/O node; and re-launching the job, including selecting an alternative connected node that is actively coupled for data communications with an active I/O node; and assigning the alternative connected node as the connected node for the block of compute nodes running the re-launched job.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2012},
month = {3}
}