DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Performing an allreduce operation using shared memory

Abstract

Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.

Inventors:
; ; ;
Issue Date:
Research Org.:
International Business Machines Corporation, Armonk, NY (USA
Sponsoring Org.:
USDOE
OSTI Identifier:
1134015
Patent Number(s):
8752051
Application Number:
13/427,057
Assignee:
International Business Machines Corporation
Patent Classifications (CPCs):
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
DOE Contract Number:  
B554331
Resource Type:
Patent
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Archer, Charles J, Dozsa, Gabor, Ratterman, Joseph D, and Smith, Brian E. Performing an allreduce operation using shared memory. United States: N. p., 2014. Web.
Archer, Charles J, Dozsa, Gabor, Ratterman, Joseph D, & Smith, Brian E. Performing an allreduce operation using shared memory. United States.
Archer, Charles J, Dozsa, Gabor, Ratterman, Joseph D, and Smith, Brian E. Tue . "Performing an allreduce operation using shared memory". United States. https://www.osti.gov/servlets/purl/1134015.
@article{osti_1134015,
title = {Performing an allreduce operation using shared memory},
author = {Archer, Charles J and Dozsa, Gabor and Ratterman, Joseph D and Smith, Brian E},
abstractNote = {Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2014},
month = {6}
}

Works referenced in this record:

Object oriented message passing system and method
patent, December 1996


Data gathering/scattering system for a plurality of processors in a parallel computer
patent, November 1998


Partitioning of processing elements in a SIMD/MIMD array processor
patent, March 1999


Prediction system for RF power distribution
patent, September 1999


Parallel computing system
patent, December 1999


Implementing locks in a distributed processing system
patent, October 2002


Protocol for self-organizing network using a logical spanning tree backbone
patent, January 2006


Multiprocessor system supporting multiple outstanding TLBI operations per partition
patent, July 2006


Reliable datagram transport service
patent, January 2007


Phased upgrade of a computing environment
patent, August 2007


Method and apparatus for storing tree data structures among and within multiple memory channels
patent, April 2008


Distributed counter and centralized sensor in barrier wait synchronization
patent, February 2009


Distributed model compilation
patent, March 2009


Massively parallel supercomputer
patent, June 2009


Synchronizing access to global resources
patent, August 2009


Method and apparatus for storing tree data structures among and within multiple memory channels
patent, November 2009


Implementing locks in a distributed processing system
patent, December 2009


Locating hardware faults in a parallel computer
patent, April 2010


Memory control device
patent, April 2010


Method and apparatus for stacked address, bus to memory data transfer
patent, June 2010


Reinforced handle assembly for lock
patent, September 2010


Computer hardware fault administration
patent, September 2010


Dynamic multipoint tree rearrangement
patent, October 2010


Root node redundancy for multipoint-to-multipoint transport trees
patent, November 2010


Efficient content authentication in peer-to-peer networks
patent, July 2011


Broadcasting a message in a parallel computer
patent, August 2011


Performing an allreduce operation using shared memory
patent, April 2012


Monitoring operating parameters in a distributed computing system with active messages
patent, May 2013


Performing a scatterv operation on a hierarchical tree network optimized for collective operations
patent, October 2013


System and method for configuring computer applications and devices using inheritance
patent-application, July 2002


Synchronization objects for multi-computer systems
patent-application, February 2003


Efficient circuits for out-of-order microprocessors
patent-application, February 2004


Arithmetic functions in torus and tree networks
patent-application, April 2004


Method and apparatus for managing an event processing system
patent-application, July 2006


Fast and memory protected asynchronous message scheme in a multi-process and multi-thread environment
patent-application, August 2006


MPI-aware networking infrastructure
patent-application, December 2006


Programming a Multi-processor System
patent-application, September 2007


Systems and methods for determining compute kernels for an application in a parallel-processing computer system
patent-application, December 2007


Systems and methods for profiling an application running on a parallel-processing computer system
patent-application, December 2007


Integrated Development Environment with Object-Oriented GUI Rendering Feature
patent-application, October 2008


Signaling Completion of a Message Transfer from an Origin Compute Node to a Target Compute Node
patent-application, November 2008


Interprocess Resource-Based Dynamic Scheduling System and Method
patent-application, November 2008


Performing an Allreduce Operation Using Shared Memory
patent-application, December 2008


Non-Volatile Memory And Method With Non-Sequential Update Block Management
patent-application, January 2009


Fault Tolerant Self-Optimizing Multi-Processor System and Method Thereof
patent-application, January 2009


Database Retrieval with a Non-Unique Key on a Parallel Computer System
patent-application, February 2009


Query Execution and Optimization Utilizing a Combining Network in a Parallel Computer System
patent-application, February 2009


System and Method for Providing a Fully Non-Blocking Switch in a Supernode of a Multi-Tiered Full-Graph Interconnect Architecture
patent-application, March 2009


Mechanism For Process Migration On A Massively Parallel Computer
patent-application, March 2009


Broadcasting A Message In A Parallel Computer
patent-application, September 2009


Collecting and Aggregating Data Using Distributed Resources
patent-application, October 2009


Novel Massively Parallel Supercomputer
patent-application, October 2009


Method and System for Increasing Throughput in a Hierarchical Wireless Network
patent-application, December 2009


Message Flow Control in a Multi-Node Computer System
patent-application, December 2009


Processing Data Access Requests Among A Plurality Of Compute Nodes
patent-application, January 2010


Recording A Communication Pattern and Replaying Messages in a Parallel Computing System
patent-application, January 2011


Cross-Channel Network Operation Offloading for Collective Operations
patent-application, May 2011


Distributed Symmetric Multiprocessing Computing Architecture
patent-application, May 2011


Adaptive Address Mapping with Dynamic Runtime Memory Mapping Selection
patent-application, June 2011


Runtime Optimization Of An Application Executing On A Parallel Computer
patent-application, October 2011


Monitoring operating parameters in a distributed computing system with active messages
patent-application, November 2011


Consolidated Information Retrieval Results
patent-application, August 2012


Automatic generation and tuning of MPI collective communication routines
conference, January 2005


Computing the Hough transform on a scan line array processor (image processing)
journal, March 1989


Computing parallel prefix and reduction using coterie structures
conference, January 1992

  • Herbordt, M. C.; Weems, C. C.
  • [1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation
  • https://doi.org/10.1109/FMPC.1992.234895

Kernel-level single system image for petascale computing
journal, April 2006


Building packet buffers using interleaved memories
conference, January 2005


Optimization of MPI Collectives on Clusters of Large-Scale SMP's
conference, January 1999


Real-Time Performance Monitoring, Adaptive Control, and Interactive Steering of Computational Grids
journal, November 2000