DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Performing an allreduce operation using shared memory

Abstract

Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.

Inventors:
 [1];  [2];  [1];  [1]
  1. Rochester, MN
  2. Ardsley, NY
Issue Date:
Research Org.:
International Business Machines Corp., Armonk, NY (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1040779
Patent Number(s):
8161480
Application Number:
11/754,782
Assignee:
International Business Machines Corporation (Armonk, NY)
Patent Classifications (CPCs):
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
DOE Contract Number:  
B554331
Resource Type:
Patent
Resource Relation:
Patent File Date: 2007 May 29
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Archer, Charles J, Dozsa, Gabor, Ratterman, Joseph D, and Smith, Brian E. Performing an allreduce operation using shared memory. United States: N. p., 2012. Web.
Archer, Charles J, Dozsa, Gabor, Ratterman, Joseph D, & Smith, Brian E. Performing an allreduce operation using shared memory. United States.
Archer, Charles J, Dozsa, Gabor, Ratterman, Joseph D, and Smith, Brian E. Tue . "Performing an allreduce operation using shared memory". United States. https://www.osti.gov/servlets/purl/1040779.
@article{osti_1040779,
title = {Performing an allreduce operation using shared memory},
author = {Archer, Charles J and Dozsa, Gabor and Ratterman, Joseph D and Smith, Brian E},
abstractNote = {Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Apr 17 00:00:00 EDT 2012},
month = {Tue Apr 17 00:00:00 EDT 2012}
}

Works referenced in this record:

Extending the message passing interface (MPI)
conference, January 1995


Efficient MPI Collective Operations for Clusters in Long-and-Fast Networks
conference, September 2006


Interleaved all-to-all reliable broadcast on meshes and hypercubes
journal, May 1994


Optimizing threaded MPI execution on SMP clusters
conference, January 2001


An All-Reduce Operation in Star Networks Using All-to-All Broadcast Communication Pattern
book, January 2005


Efficient algorithms for all-to-all communications in multiport message-passing systems
journal, January 1997


Computing parallel prefix and reduction using coterie structures
conference, January 1992

  • Herbordt, M. C.; Weems, C. C.
  • [1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation
  • https://doi.org/10.1109/FMPC.1992.234895

Optimization of MPI collectives on clusters of large-scale SMP's
conference, January 1999


Universality of mixed action extrapolation formulae
journal, April 2009


Coprocessor design to support MPI primitives in configurable multiprocessors
journal, April 2007


Computing the Hough transform on a scan line array processor (image processing)
journal, March 1989


DADO: A tree-structured machine architecture for production systems
report, March 1982


Bandwidth Efficient All-reduce Operation on Tree Topologies
conference, March 2007