skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Runtime optimization of an application executing on a parallel computer

Abstract

Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.

Publication Date:
Research Org.:
International Business Machines Corporation, Armonk, NY (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1164339
Patent Number(s):
8,898,678
Application Number:
13/663,545
Assignee:
International Business Machines Corporation (Armonk, NY) OSTI
DOE Contract Number:
B554331
Resource Type:
Patent
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

None. Runtime optimization of an application executing on a parallel computer. United States: N. p., 2014. Web.
None. Runtime optimization of an application executing on a parallel computer. United States.
None. Tue . "Runtime optimization of an application executing on a parallel computer". United States. doi:. https://www.osti.gov/servlets/purl/1164339.
@article{osti_1164339,
title = {Runtime optimization of an application executing on a parallel computer},
author = {None},
abstractNote = {Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Nov 25 00:00:00 EST 2014},
month = {Tue Nov 25 00:00:00 EST 2014}
}

Patent:

Save / Share:
  • Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all computemore » nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.« less
  • Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all computemore » nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.« less
  • Executing a scatter operation on a parallel computer includes: configuring a send buffer on a logical root, the send buffer having positions, each position corresponding to a ranked node in an operational group of compute nodes and for storing contents scattered to that ranked node; and repeatedly for each position in the send buffer: broadcasting, by the logical root to each of the other compute nodes on a global combining network, the contents of the current position of the send buffer using a bitwise OR operation, determining, by each compute node, whether the current position in the send buffer correspondsmore » with the rank of that compute node, if the current position corresponds with the rank, receiving the contents and storing the contents in a reception buffer of that compute node, and if the current position does not correspond with the rank, discarding the contents.« less
  • Methods, apparatus, and computer program products are disclosed for executing a gather operation on a parallel computer according to embodiments of the present invention. Embodiments include configuring, by the logical root, a result buffer or the logical root, the result buffer having positions, each position corresponding to a ranked node in the operational group and for storing contribution data gathered from that ranked node. Embodiments also include repeatedly for each position in the result buffer: determining, by each compute node of an operational group, whether the current position in the result buffer corresponds with the rank of the compute node,more » if the current position in the result buffer corresponds with the rank of the compute node, contributing, by that compute node, the compute node's contribution data, if the current position in the result buffer does not correspond with the rank of the compute node, contributing, by that compute node, a value of zero for the contribution data, and storing, by the logical root in the current position in the result buffer, results of a bitwise OR operation of all the contribution data by all compute nodes of the operational group for the current position, the results received through the global combining network.« less
  • Aggregating job exit statuses of a plurality of compute nodes executing a parallel application, including: identifying a subset of compute nodes in the parallel computer to execute the parallel application; selecting one compute node in the subset of compute nodes in the parallel computer as a job leader compute node; initiating execution of the parallel application on the subset of compute nodes; receiving an exit status from each compute node in the subset of compute nodes, where the exit status for each compute node includes information describing execution of some portion of the parallel application by the compute node; aggregatingmore » each exit status from each compute node in the subset of compute nodes; and sending an aggregated exit status for the subset of compute nodes in the parallel computer.« less