Methods, apparatus and system for selective duplication of subtasks
Abstract
A method for selective duplication of subtasks in a high-performance computing system includes: monitoring a health status of one or more nodes in a high-performance computing system, where one or more subtasks of a parallel task execute on the one or more nodes; identifying one or more nodes as having a likelihood of failure which exceeds a first prescribed threshold; selectively duplicating the one or more subtasks that execute on the one or more nodes having a likelihood of failure which exceeds the first prescribed threshold; and notifying a messaging library that one or more subtasks were duplicated.
- Inventors:
- Issue Date:
- Research Org.:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1244231
- Patent Number(s):
- 9298553
- Application Number:
- 14/176,083
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
H - ELECTRICITY H04 - ELECTRIC COMMUNICATION TECHNIQUE H04L - TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- DOE Contract Number:
- B599858
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 2014 Feb 08
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; 99 GENERAL AND MISCELLANEOUS
Citation Formats
Andrade Costa, Carlos H., Cher, Chen-Yong, Park, Yoonho, Rosenburg, Bryan S., and Ryu, Kyung D. Methods, apparatus and system for selective duplication of subtasks. United States: N. p., 2016.
Web.
Andrade Costa, Carlos H., Cher, Chen-Yong, Park, Yoonho, Rosenburg, Bryan S., & Ryu, Kyung D. Methods, apparatus and system for selective duplication of subtasks. United States.
Andrade Costa, Carlos H., Cher, Chen-Yong, Park, Yoonho, Rosenburg, Bryan S., and Ryu, Kyung D. Tue .
"Methods, apparatus and system for selective duplication of subtasks". United States. https://www.osti.gov/servlets/purl/1244231.
@article{osti_1244231,
title = {Methods, apparatus and system for selective duplication of subtasks},
author = {Andrade Costa, Carlos H. and Cher, Chen-Yong and Park, Yoonho and Rosenburg, Bryan S. and Ryu, Kyung D.},
abstractNote = {A method for selective duplication of subtasks in a high-performance computing system includes: monitoring a health status of one or more nodes in a high-performance computing system, where one or more subtasks of a parallel task execute on the one or more nodes; identifying one or more nodes as having a likelihood of failure which exceeds a first prescribed threshold; selectively duplicating the one or more subtasks that execute on the one or more nodes having a likelihood of failure which exceeds the first prescribed threshold; and notifying a messaging library that one or more subtasks were duplicated.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2016},
month = {3}
}