skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Performing an allreduce operation using shared memory

Abstract

Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.

Inventors:
 [1];  [2];  [1];  [1]
  1. Rochester, MN
  2. Ardsley, NY
Publication Date:
Research Org.:
International Business Machines Corp., Armonk, NY (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1040779
Patent Number(s):
8,161,480
Application Number:
11/754,782
Assignee:
International Business Machines Corporation (Armonk, NY)
DOE Contract Number:  
B554331
Resource Type:
Patent
Resource Relation:
Patent File Date: 2007 May 29
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Archer, Charles J, Dozsa, Gabor, Ratterman, Joseph D, and Smith, Brian E. Performing an allreduce operation using shared memory. United States: N. p., 2012. Web.
Archer, Charles J, Dozsa, Gabor, Ratterman, Joseph D, & Smith, Brian E. Performing an allreduce operation using shared memory. United States.
Archer, Charles J, Dozsa, Gabor, Ratterman, Joseph D, and Smith, Brian E. Tue . "Performing an allreduce operation using shared memory". United States. https://www.osti.gov/servlets/purl/1040779.
@article{osti_1040779,
title = {Performing an allreduce operation using shared memory},
author = {Archer, Charles J and Dozsa, Gabor and Ratterman, Joseph D and Smith, Brian E},
abstractNote = {Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2012},
month = {4}
}

Patent:

Save / Share:

Works referenced in this record:

Extending the message passing interface (MPI)
conference, January 1995

  • Skjellum, A.; Doss, N. E.; Viswanathan, K.
  • Proceedings Scalable Parallel Libraries Conference
  • DOI: 10.1109/SPLC.1994.376998

Efficient MPI Collective Operations for Clusters in Long-and-Fast Networks
conference, September 2006

  • Matsuda, Motohiko; Kudoh, Tomohiro; Kodama, Yuetsu
  • 2006 IEEE International Conference on Cluster Computing
  • DOI: 10.1109/CLUSTR.2006.311848

Interleaved all-to-all reliable broadcast on meshes and hypercubes
journal, May 1994

  • Sunggu Lee, ; Shin, K. G.
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 5, Issue 5
  • DOI: 10.1109/71.282556

Optimizing threaded MPI execution on SMP clusters
conference, January 2001

  • Tang, Hong; Yang, Tao
  • Proceedings of the 15th international conference on Supercomputing - ICS '01
  • DOI: 10.1145/377792.377895

Efficient algorithms for all-to-all communications in multiport message-passing systems
journal, January 1997

  • Bruck, J.; Kipnis, S.
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 8, Issue 11
  • DOI: 10.1109/71.642949

Computing parallel prefix and reduction using coterie structures
conference, January 1992

  • Herbordt, M. C.; Weems, C. C.
  • [1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation
  • DOI: 10.1109/FMPC.1992.234895

Optimization of MPI collectives on clusters of large-scale SMP's
conference, January 1999

  • Sistare, Steve; vandeVaart, Rolf; Loh, Eugene
  • Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99
  • DOI: 10.1145/331532.331555

Universality of mixed action extrapolation formulae
journal, April 2009

  • Chen, Jiunn-Wei; Walker-Loud, Andr√©; O'Connell, Donal
  • Journal of High Energy Physics, Vol. 2009, Issue 04
  • DOI: 10.1088/1126-6708/2009/04/090

Coprocessor design to support MPI primitives in configurable multiprocessors
journal, April 2007

  • Ziavras, Sotirios G.; Gerbessiotis, Alexandros V.; Bafna, Rohan
  • Integration, the VLSI Journal, Vol. 40, Issue 3, p. 235-252
  • DOI: 10.1016/j.vlsi.2005.10.001

Computing the Hough transform on a scan line array processor (image processing)
journal, March 1989

  • Fisher, A. L.; Highnam, P. T.
  • IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, Issue 3
  • DOI: 10.1109/34.21795

DADO: A tree-structured machine architecture for production systems
report, March 1982

  • Stolfo, Salvatore; Shaw, David Elliot
  • Columbia University, 15 p.
  • CUCS-24-82
  • DOI: 10.7916/D8SQ97CN

Bandwidth Efficient All-reduce Operation on Tree Topologies
conference, March 2007

  • Patarasuk, Pitch; Yuan, Xin
  • 2007 IEEE International Parallel and Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2007.370405