DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Performing an allreduce operation on a plurality of compute nodes of a parallel computer

Abstract

Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: establishing, for each node, a plurality of logical rings, each ring including a different set of at least one core on that node, each ring including the cores on at least two of the nodes; iteratively for each node: assigning each core of that node to one of the rings established for that node to which the core has not previously been assigned, and performing, for each ring for that node, a global allreduce operation using contribution data for the cores assigned to that ring or any global allreduce results from previous global allreduce operations, yielding current global allreduce results for each core; and performing, for each node, a local allreduce operation using the global allreduce results.

Inventors:
Issue Date:
Research Org.:
International Business Machines Corp., Armonk, NY (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1087901
Patent Number(s):
8484440
Application Number:
12/124,745
Assignee:
International Business Machines Corporation (Armonk, NY)
Patent Classifications (CPCs):
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
H - ELECTRICITY H04 - ELECTRIC COMMUNICATION TECHNIQUE H04L - TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
DOE Contract Number:  
B554331
Resource Type:
Patent
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Faraj, Ahmad. Performing an allreduce operation on a plurality of compute nodes of a parallel computer. United States: N. p., 2013. Web.
Faraj, Ahmad. Performing an allreduce operation on a plurality of compute nodes of a parallel computer. United States.
Faraj, Ahmad. Tue . "Performing an allreduce operation on a plurality of compute nodes of a parallel computer". United States. https://www.osti.gov/servlets/purl/1087901.
@article{osti_1087901,
title = {Performing an allreduce operation on a plurality of compute nodes of a parallel computer},
author = {Faraj, Ahmad},
abstractNote = {Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: establishing, for each node, a plurality of logical rings, each ring including a different set of at least one core on that node, each ring including the cores on at least two of the nodes; iteratively for each node: assigning each core of that node to one of the rings established for that node to which the core has not previously been assigned, and performing, for each ring for that node, a global allreduce operation using contribution data for the cores assigned to that ring or any global allreduce results from previous global allreduce operations, yielding current global allreduce results for each core; and performing, for each node, a local allreduce operation using the global allreduce results.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Jul 09 00:00:00 EDT 2013},
month = {Tue Jul 09 00:00:00 EDT 2013}
}

Works referenced in this record:

Root node redundancy for multipoint-to-multipoint transport trees
patent, November 2010


Performing An All-To-All Data Exchange On A Plurality Of Data Buffers By Performing Swap Operations
patent-application, January 2010


Data transfer apparatus and method
patent, October 2005


Non-Binary Source-to-Channel Symbol Mappings with Minimized Distortion
patent-application, August 2009


Method and system for pre-pending layer 2 (L2) frame descriptors
patent-application, June 2005


Performing A Scatterv Operation On A Hierarchical Tree Network Optimized For Collective Operations
patent-application, September 2011


Correlating Hardware Devices Between Local Operating System and Global Management Entity
patent-application, August 2008


Method and apparatus for manifold array processing
patent, December 2000


Locating hardware faults in a parallel computer
patent, April 2010


Method and apparatus for pre-provisioning networks to support fast restoration with minimum overbuild
patent-application, November 2005


Interleaved all-to-all reliable broadcast on meshes and hypercubes
journal, May 1994


Executing a Scatter Operation on a Parallel Computer
patent-application, October 2008


Class network routing
patent, September 2009


Performance analysis and optimization of MPI collective operations on multi-core clusters
journal, April 2009


Direct Memory Access ('DMA') Engine Assisted Local Reduction
patent-application, January 2009


Broadcasting A Message In A Parallel Computer
patent-application, September 2009


Computing the Hough transform on a scan line array processor (image processing)
journal, March 1989


Method for testing integrated memory using an integrated DMA controller
patent, September 1997


DMA descriptor queue read and cache write pointer arrangement
patent, February 2009


Method and apparatus for stacked address, bus to memory data transfer
patent, June 2010


Partitioning of processing elements in a SIMD/MIMD array processor
patent, March 1999


Performing process migration with allreduce operations
patent, December 2010


Apparatus and methods for connecting modules using remote switching
patent-application, February 2002


Scalable system control unit for distributed shared memory multi-processor systems
patent, April 2002


Protocol for self-organizing network using a logical spanning tree backbone
patent, January 2006


System and method for configuring computer applications and devices using inheritance
patent-application, July 2002


Lingering locks with fairness control for multi-node computer systems
patent, November 2002


Building packet buffers using interleaved memories
conference, January 2005


Computer hardware fault administration
patent, September 2010


Performing an Allreduce Operation Using Shared Memory
patent-application, December 2008


Performing A Deterministic Reduction Operation In A Parallel Computer
patent-application, December 2011


Adaptive congestion control mechanism for modular computer networks
patent, September 1999


Optimizing Collective Operations
patent-application, November 2011


Bandwidth Efficient All-reduce Operation on Tree Topologies
conference, March 2007


System and method for generating object code for map-reduce idioms in multiprocessor systems
patent-application, May 2008


Multiprocessor computer system with interleaved processing element nodes
patent, April 1998


Apparatus and method for controlling direct memory access
patent-application, August 2006


Apparatus, system, and method for reliable, fast, and scalable multicast message delivery in service overlay networks
patent-application, May 2007


Method and apparatus for storing tree data structures among and within multiple memory channels
patent, April 2008


Parallel computing system
patent, December 1999


Multi-use data access descriptor
patent-application, October 2002


Arithmetic functions in torus and tree networks
patent-application, April 2004


Optimized Collectives Using a DMA on a Parallel Computer
patent-application, January 2009


Cross-Channel Network Operation Offloading for Collective Operations
patent-application, May 2011


Video output controller and video card
patent, July 2005


Send-Side Matching Of Data Communications Messages
patent-application, March 2012


Cluster Computing Support for Application Programs
patent-application, December 2007


Executing a Gather Operation on a Parallel Computer
patent-application, October 2010


Performing An Allreduce Operation Using Shared Memory
patent-application, July 2012


Computer Hardware Fault Diagnosis
patent-application, October 2007


Administering an Epoch Initiated for Remote Memory Access
patent-application, December 2008


Executing an Allgather Operation on a Parallel Computer
patent-application, October 2007


Processing Data Communications Events In A Parallel Active Messaging Interface Of A Parallel Computer
patent-application, May 2012


Broadcast invalidate scheme
patent-application, April 2004


Executing an Allgather Operation with an Alltoallv Operation in a Parallel Computer
patent-application, January 2008


Irregular network
patent-application, November 2003


Hexagonal mesh multiprocessor system
patent, March 1992


Performing A Local Reduction Operation On A Parallel Computer
patent-application, October 2011


Computing parallel prefix and reduction using coterie structures
conference, January 1992

  • Herbordt, M. C.; Weems, C. C.
  • [1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation
  • https://doi.org/10.1109/FMPC.1992.234895

Distributed processing multi-processor computer
patent-application, September 2003


Data gathering/scattering system for a plurality of processors in a parallel computer
patent, November 1998


Adaptive Address Mapping with Dynamic Runtime Memory Mapping Selection
patent-application, June 2011


Hybrid hypercube/torus architecture
patent, May 2001


Method, System, and Program for Handling Input/Output Commands
patent-application, July 2006


Parallel Programming Development Environment
patent-application, May 2002


Recording A Communication Pattern and Replaying Messages in a Parallel Computing System
patent-application, January 2011


Tracking Network Contention
patent-application, June 2009


Direct memory access controller system with message-based programming
patent-application, July 2005


Towards Efficient Execution of MPI Applications on the Grid: Porting and Optimization Issues
journal, January 2003


Method and apparatus for storing tree data structures among and within multiple memory channels
patent, November 2009


Method and apparatus for internetworking buffer management
patent, August 2000


Performing Process Migration with Allreduce Operations
patent-application, July 2010


Data transfer apparatus and method
patent-application, October 2003


Effecting a Broadcast with an Allreduce Operation on a Parallel Computer
patent-application, February 2009


Facilitating intra-node data transfer in collective communications
patent, May 2009


Virtual private networks within a packet network having a mesh topology
patent-application, May 2005


Handling potential deadlocks and correctness problems of reduce operations in parallel systems
patent-application, March 2009


Optimizing threaded MPI execution on SMP clusters
conference, January 2001


Managing Hardware Resources by Sending Messages Amongst Servers in a Data Center
patent-application, July 2011


Broadcasting A Message In A Parallel Computer
patent-application, October 2009


Performing A Deterministic Reduction Operation In A Parallel Computer
patent-application, December 2011


Communications network
patent, March 2004


Abmahnung statt Jobverlust: Kündigung
journal, August 2010


Efficient circuits for out-of-order microprocessors
patent-application, February 2004


Reliable datagram transport service
patent, January 2007


Extending the message passing interface (MPI)
conference, January 1995


Efficient MPI Collective Operations for Clusters in Long-and-Fast Networks
conference, September 2006


Apparatus and method for capacity planning for data center server consolidation and workload reassignment
patent-application, March 2008


Efficient algorithms for all-to-all communications in multiport message-passing systems
journal, January 1997


Novel Massively Parallel Supercomputer
patent-application, October 2009


Parallel processing method
patent, June 1989


Prediction system for RF power distribution
patent, September 1999


Coprocessor design to support MPI primitives in configurable multiprocessors
journal, April 2007


Database system providing optimization of group by operator over a union all
patent, February 2004


Mechanism For Process Migration On A Massively Parallel Computer
patent-application, March 2009


Multicomputer memory access architecture
patent, February 1998


Non-Volatile Memory And Method With Non-Sequential Update Block Management
patent-application, January 2009


Method and apparatus for zeroing a transfer buffer memory as a background task
patent-application, May 2002


Performing An All-To-All Data Exchange On A Plurality Of Data Buffers By Performing Swap Operations
patent-application, August 2012


Memory control device
patent, April 2010


Manifold array processor
patent, March 2007


Development of parallel/distributed applications
patent-application, December 2006


Effecting Hardware Acceleration Of Broadcast Operations In A Parallel Computer
patent-application, November 2011


Optimization of MPI collectives on clusters of large-scale SMP's
conference, January 1999


Parallel-Prefix Broadcast for a Parallel-Prefix Operation on a Parallel Computer
patent-application, October 2008


Line-Plane Broadcasting in a Data Communications Network of a Parallel Computer
patent-application, February 2009


Method and apparatus for efficient transfer of data packets
patent, May 2004


DADO: A tree-structured machine architecture for production systems
report, March 1982


Massively parallel supercomputer
patent, June 2009


Executing an Allgather Operation on a Parallel Computer
patent-application, February 2009


Line-Plane Broadcasting in a Data Communications Network of a Parallel Computer
patent-application, February 2009