DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job

Abstract

Methods, systems, and products for dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job that include: identifying that a job failed to execute on the block of compute nodes because connectivity failed between a compute node assigned as at least one of the connected nodes for the block of compute nodes and its supporting I/O node; and re-launching the job, including selecting an alternative connected node that is actively coupled for data communications with an active I/O node; and assigning the alternative connected node as the connected node for the block of compute nodes running the re-launched job.

Inventors:
 [1];  [1];  [1];  [1];  [2]
  1. Rochester, MN
  2. Byron, MN
Issue Date:
Research Org.:
International Business Machines Corp., Armonk, NY (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1039560
Patent Number(s):
8140889
Application Number:
US patent applicaiton 12/861,426
Assignee:
International Business Machines Corporation (Armonk, NY)
Patent Classifications (CPCs):
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
DOE Contract Number:  
B554331
Resource Type:
Patent
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Budnik, Thomas A, Knudson, Brant L, Megerian, Mark G, Miller, Samuel J, and Stockdell, William M. Dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job. United States: N. p., 2012. Web.
Budnik, Thomas A, Knudson, Brant L, Megerian, Mark G, Miller, Samuel J, & Stockdell, William M. Dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job. United States.
Budnik, Thomas A, Knudson, Brant L, Megerian, Mark G, Miller, Samuel J, and Stockdell, William M. Tue . "Dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job". United States. https://www.osti.gov/servlets/purl/1039560.
@article{osti_1039560,
title = {Dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job},
author = {Budnik, Thomas A and Knudson, Brant L and Megerian, Mark G and Miller, Samuel J and Stockdell, William M},
abstractNote = {Methods, systems, and products for dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job that include: identifying that a job failed to execute on the block of compute nodes because connectivity failed between a compute node assigned as at least one of the connected nodes for the block of compute nodes and its supporting I/O node; and re-launching the job, including selecting an alternative connected node that is actively coupled for data communications with an active I/O node; and assigning the alternative connected node as the connected node for the block of compute nodes running the re-launched job.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2012},
month = {3}
}