skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Distributing an executable job load file to compute nodes in a parallel computer

Abstract

Distributing an executable job load file to compute nodes in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: determining, by a compute node in the parallel computer, whether the compute node is participating in a job; determining, by the compute node in the parallel computer, whether a descendant compute node is participating in the job; responsive to determining that the compute node is participating in the job or that the descendant compute node is participating in the job, communicating, by the compute node to a parent compute node, an identification of a data communications link over which the compute node receives data from the parent compute node; constructing a class route for the job, wherein the class route identifies all compute nodes participating in the job; and broadcasting the executable load file for the job along the class route for the job.

Inventors:
Publication Date:
Research Org.:
International Business Machines Corporation, Armonk, NY (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1324403
Patent Number(s):
9,444,908
Application Number:
14/303,208
Assignee:
International Business Machines Corporation (Armonk, NY) DOEEE
Resource Type:
Patent
Resource Relation:
Patent File Date: 2014 Jun 12
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Gooding, Thomas M. Distributing an executable job load file to compute nodes in a parallel computer. United States: N. p., 2016. Web.
Gooding, Thomas M. Distributing an executable job load file to compute nodes in a parallel computer. United States.
Gooding, Thomas M. 2016. "Distributing an executable job load file to compute nodes in a parallel computer". United States. doi:. https://www.osti.gov/servlets/purl/1324403.
@article{osti_1324403,
title = {Distributing an executable job load file to compute nodes in a parallel computer},
author = {Gooding, Thomas M.},
abstractNote = {Distributing an executable job load file to compute nodes in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: determining, by a compute node in the parallel computer, whether the compute node is participating in a job; determining, by the compute node in the parallel computer, whether a descendant compute node is participating in the job; responsive to determining that the compute node is participating in the job or that the descendant compute node is participating in the job, communicating, by the compute node to a parent compute node, an identification of a data communications link over which the compute node receives data from the parent compute node; constructing a class route for the job, wherein the class route identifies all compute nodes participating in the job; and broadcasting the executable load file for the job along the class route for the job.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = 2016,
month = 9
}

Patent:

Save / Share:
  • Distributing an executable job load file to compute nodes in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: determining, by a compute node in the parallel computer, whether the compute node is participating in a job; determining, by the compute node in the parallel computer, whether a descendant compute node is participating in the job; responsive to determining that the compute node is participating in the job or that the descendant compute node is participating in the job, communicating, by the compute node to a parent compute node, an identification of a data communications linkmore » over which the compute node receives data from the parent compute node; constructing a class route for the job, wherein the class route identifies all compute nodes participating in the job; and broadcasting the executable load file for the job along the class route for the job.« less
  • Methods, apparatus, and products are disclosed for scheduling applications for execution on a plurality of compute nodes of a parallel computer to manage temperature of the plurality of compute nodes during execution that include: identifying one or more applications for execution on the plurality of compute nodes; creating a plurality of physically discontiguous node partitions in dependence upon temperature characteristics for the compute nodes and a physical topology for the compute nodes, each discontiguous node partition specifying a collection of physically adjacent compute nodes; and assigning, for each application, that application to one or more of the discontiguous node partitionsmore » for execution on the compute nodes specified by the assigned discontiguous node partitions.« less
  • Methods, apparatus, and products are disclosed for configuring compute nodes of a parallel computer in an operational group into a plurality of independent non-overlapping collective networks, the compute nodes in the operational group connected together for data communications through a global combining network, that include: partitioning the compute nodes in the operational group into a plurality of non-overlapping subgroups; designating one compute node from each of the non-overlapping subgroups as a master node; and assigning, to the compute nodes in each of the non-overlapping subgroups, class routing instructions that organize the compute nodes in that non-overlapping subgroup as a collectivemore » network such that the master node is a physical root.« less
  • Methods, systems, and products are disclosed for chaining DMA data transfer operations for compute nodes in a parallel computer that include: receiving, by an origin DMA engine on an origin node in an origin injection FIFO buffer for the origin DMA engine, a RGET data descriptor specifying a DMA transfer operation data descriptor on the origin node and a second RGET data descriptor on the origin node, the second RGET data descriptor specifying a target RGET data descriptor on the target node, the target RGET data descriptor specifying an additional DMA transfer operation data descriptor on the origin node; creating,more » by the origin DMA engine, an RGET packet in dependence upon the RGET data descriptor, the RGET packet containing the DMA transfer operation data descriptor and the second RGET data descriptor; and transferring, by the origin DMA engine to a target DMA engine on the target node, the RGET packet.« less
  • Methods, parallel computers, and computer program products are disclosed for low latency, high bandwidth data communications between compute nodes in a parallel computer. Embodiments include receiving, by an origin direct memory access (`DMA`) engine of an origin compute node, data for transfer to a target compute node; sending, by the origin DMA engine of the origin compute node to a target DMA engine on the target compute node, a request to send (`RTS`) message; transferring, by the origin DMA engine, a predetermined portion of the data to the target compute node using memory FIFO operation; determining, by the origin DMAmore » engine whether an acknowledgement of the RTS message has been received from the target DMA engine; if the an acknowledgement of the RTS message has not been received, transferring, by the origin DMA engine, another predetermined portion of the data to the target compute node using a memory FIFO operation; and if the acknowledgement of the RTS message has been received by the origin DMA engine, transferring, by the origin DMA engine, any remaining portion of the data to the target compute node using a direct put operation.« less