Aggregating job exit statuses of a plurality of compute nodes executing a parallel application
Abstract
Aggregating job exit statuses of a plurality of compute nodes executing a parallel application, including: identifying a subset of compute nodes in the parallel computer to execute the parallel application; selecting one compute node in the subset of compute nodes in the parallel computer as a job leader compute node; initiating execution of the parallel application on the subset of compute nodes; receiving an exit status from each compute node in the subset of compute nodes, where the exit status for each compute node includes information describing execution of some portion of the parallel application by the compute node; aggregating each exit status from each compute node in the subset of compute nodes; and sending an aggregated exit status for the subset of compute nodes in the parallel computer.
- Inventors:
- Issue Date:
- Research Org.:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1195933
- Patent Number(s):
- 9086962
- Application Number:
- 13/524,602
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- B579040
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 2012 Jun 15
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Aho, Michael E., Attinella, John E., Gooding, Thomas M., and Mundy, Michael B. Aggregating job exit statuses of a plurality of compute nodes executing a parallel application. United States: N. p., 2015.
Web.
Aho, Michael E., Attinella, John E., Gooding, Thomas M., & Mundy, Michael B. Aggregating job exit statuses of a plurality of compute nodes executing a parallel application. United States.
Aho, Michael E., Attinella, John E., Gooding, Thomas M., and Mundy, Michael B. Tue .
"Aggregating job exit statuses of a plurality of compute nodes executing a parallel application". United States. https://www.osti.gov/servlets/purl/1195933.
@article{osti_1195933,
title = {Aggregating job exit statuses of a plurality of compute nodes executing a parallel application},
author = {Aho, Michael E. and Attinella, John E. and Gooding, Thomas M. and Mundy, Michael B.},
abstractNote = {Aggregating job exit statuses of a plurality of compute nodes executing a parallel application, including: identifying a subset of compute nodes in the parallel computer to execute the parallel application; selecting one compute node in the subset of compute nodes in the parallel computer as a job leader compute node; initiating execution of the parallel application on the subset of compute nodes; receiving an exit status from each compute node in the subset of compute nodes, where the exit status for each compute node includes information describing execution of some portion of the parallel application by the compute node; aggregating each exit status from each compute node in the subset of compute nodes; and sending an aggregated exit status for the subset of compute nodes in the parallel computer.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Jul 21 00:00:00 EDT 2015},
month = {Tue Jul 21 00:00:00 EDT 2015}
}
Works referenced in this record:
Management system and method for parallel computer system
patent, August 1999
- Matsushita, Masayuki; Ugajin, Atsushi
- US Patent Document 5,937,201
Multiprogrammed multiprocessor system with lobally controlled communication and signature controlled scheduling
patent, July 2004
- McColl, William Finlay; Hill, Jonathan Murray; Valiant, Leslie G.
- US Patent Document 6,763,519
Ultrascalable Petaflop Parallel Supercomputer
patent-application, January 2009
- Blumrich, Matthias A.; Chen, Dong; Chiu, George
- US Patent Document 11/768905; 20090006808