DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Performing an allreduce operation on a plurality of compute nodes of a parallel computer

Abstract

Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: performing, for each node, a local reduction operation using allreduce contribution data for the cores of that node, yielding, for each node, a local reduction result for one or more representative cores for that node; establishing one or more logical rings among the nodes, each logical ring including only one of the representative cores from each node; performing, for each logical ring, a global allreduce operation using the local reduction result for the representative cores included in that logical ring, yielding a global allreduce result for each representative core included in that logical ring; and performing, for each node, a local broadcast operation using the global allreduce results for each representative core on that node.

Inventors:
Issue Date:
Research Org.:
International Business Machines Corp., Armonk, NY (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1082948
Patent Number(s):
8375197
Application Number:
12/124,763
Assignee:
International Business Machines Corporation (Armonk, NY)
Patent Classifications (CPCs):
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
DOE Contract Number:  
B554331
Resource Type:
Patent
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Faraj, Ahmad. Performing an allreduce operation on a plurality of compute nodes of a parallel computer. United States: N. p., 2013. Web.
Faraj, Ahmad. Performing an allreduce operation on a plurality of compute nodes of a parallel computer. United States.
Faraj, Ahmad. Tue . "Performing an allreduce operation on a plurality of compute nodes of a parallel computer". United States. https://www.osti.gov/servlets/purl/1082948.
@article{osti_1082948,
title = {Performing an allreduce operation on a plurality of compute nodes of a parallel computer},
author = {Faraj, Ahmad},
abstractNote = {Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: performing, for each node, a local reduction operation using allreduce contribution data for the cores of that node, yielding, for each node, a local reduction result for one or more representative cores for that node; establishing one or more logical rings among the nodes, each logical ring including only one of the representative cores from each node; performing, for each logical ring, a global allreduce operation using the local reduction result for the representative cores included in that logical ring, yielding a global allreduce result for each representative core included in that logical ring; and performing, for each node, a local broadcast operation using the global allreduce results for each representative core on that node.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2013},
month = {2}
}

Works referenced in this record:

Non-Binary Source-to-Channel Symbol Mappings with Minimized Distortion
patent-application, August 2009


Tracking Network Contention
patent-application, June 2009


Direct memory access controller system with message-based programming
patent-application, July 2005


Correlating Hardware Devices Between Local Operating System and Global Management Entity
patent-application, August 2008


Method and apparatus for pre-provisioning networks to support fast restoration with minimum overbuild
patent-application, November 2005


Interleaved all-to-all reliable broadcast on meshes and hypercubes
journal, May 1994


An All-Reduce Operation in Star Networks Using All-to-All Broadcast Communication Pattern
book, January 2005


Method and apparatus for internetworking buffer management
patent, August 2000


Performing Process Migration with Allreduce Operations
patent-application, July 2010


Data transfer apparatus and method
patent-application, October 2003


Computing the Hough transform on a scan line array processor (image processing)
journal, March 1989


Method for testing integrated memory using an integrated DMA controller
patent, September 1997


DMA descriptor queue read and cache write pointer arrangement
patent, February 2009


Facilitating intra-node data transfer in collective communications
patent, May 2009


Virtual private networks within a packet network having a mesh topology
patent-application, May 2005


Partitioning of processing elements in a SIMD/MIMD array processor
patent, March 1999


Performing process migration with allreduce operations
patent, December 2010


Apparatus and methods for connecting modules using remote switching
patent-application, February 2002


Handling potential deadlocks and correctness problems of reduce operations in parallel systems
patent-application, March 2009


Optimizing threaded MPI execution on SMP clusters
conference, January 2001


Managing Hardware Resources by Sending Messages Amongst Servers in a Data Center
patent-application, July 2011


Performing an Allreduce Operation Using Shared Memory
patent-application, December 2008


Adaptive congestion control mechanism for modular computer networks
patent, September 1999


Bandwidth Efficient All-reduce Operation on Tree Topologies
conference, March 2007


System and method for generating object code for map-reduce idioms in multiprocessor systems
patent-application, May 2008


Communications network
patent, March 2004


Apparatus and method for controlling direct memory access
patent-application, August 2006


Apparatus, system, and method for reliable, fast, and scalable multicast message delivery in service overlay networks
patent-application, May 2007


Parallel computing system
patent, December 1999


Multi-use data access descriptor
patent-application, October 2002


Efficient circuits for out-of-order microprocessors
patent-application, February 2004


Extending the message passing interface (MPI)
conference, January 1995


Efficient MPI Collective Operations for Clusters in Long-and-Fast Networks
conference, September 2006


Arithmetic functions in torus and tree networks
patent-application, April 2004


Optimized Collectives Using a DMA on a Parallel Computer
patent-application, January 2009


Cross-Channel Network Operation Offloading for Collective Operations
patent-application, May 2011


Apparatus and method for capacity planning for data center server consolidation and workload reassignment
patent-application, March 2008


Efficient algorithms for all-to-all communications in multiport message-passing systems
journal, January 1997


Parallel processing method
patent, June 1989


Prediction system for RF power distribution
patent, September 1999


Video output controller and video card
patent, July 2005


Coprocessor design to support MPI primitives in configurable multiprocessors
journal, April 2007


Database system providing optimization of group by operator over a union all
patent, February 2004


Cluster Computing Support for Application Programs
patent-application, December 2007


Phased upgrade of a computing environment
patent, August 2007


Mechanism For Process Migration On A Massively Parallel Computer
patent-application, March 2009


Computer Hardware Fault Diagnosis
patent-application, October 2007


Executing an Allgather Operation on a Parallel Computer
patent-application, October 2007


Multicomputer memory access architecture
patent, February 1998


Method and apparatus for zeroing a transfer buffer memory as a background task
patent-application, May 2002


Irregular network
patent-application, November 2003


Development of parallel/distributed applications
patent-application, December 2006


Computing parallel prefix and reduction using coterie structures
conference, January 1992

  • Herbordt, M. C.; Weems, C. C.
  • [1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation
  • https://doi.org/10.1109/FMPC.1992.234895

Distributed processing multi-processor computer
patent-application, September 2003


Optimization of MPI collectives on clusters of large-scale SMP's
conference, January 1999


Data gathering/scattering system for a plurality of processors in a parallel computer
patent, November 1998


Method and apparatus for efficient transfer of data packets
patent, May 2004


Method, System, and Program for Handling Input/Output Commands
patent-application, July 2006


Parallel Programming Development Environment
patent-application, May 2002