DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Performing an allreduce operation on a plurality of compute nodes of a parallel computer

Abstract

Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: establishing, for each node, a plurality of logical rings, each ring including a different set of at least one core on that node, each ring including the cores on at least two of the nodes; iteratively for each node: assigning each core of that node to one of the rings established for that node to which the core has not previously been assigned, and performing, for each ring for that node, a global allreduce operation using contribution data for the cores assigned to that ring or any global allreduce results from previous global allreduce operations, yielding current global allreduce results for each core; and performing, for each node, a local allreduce operation using the global allreduce results.

Inventors:
Issue Date:
Research Org.:
International Business Machines Corp., Armonk, NY (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1087901
Patent Number(s):
8484440
Application Number:
12/124,745
Assignee:
International Business Machines Corporation (Armonk, NY)
Patent Classifications (CPCs):
H - ELECTRICITY H04 - ELECTRIC COMMUNICATION TECHNIQUE H04L - TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
DOE Contract Number:  
B554331
Resource Type:
Patent
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Faraj, Ahmad. Performing an allreduce operation on a plurality of compute nodes of a parallel computer. United States: N. p., 2013. Web.
Faraj, Ahmad. Performing an allreduce operation on a plurality of compute nodes of a parallel computer. United States.
Faraj, Ahmad. Tue . "Performing an allreduce operation on a plurality of compute nodes of a parallel computer". United States. https://www.osti.gov/servlets/purl/1087901.
@article{osti_1087901,
title = {Performing an allreduce operation on a plurality of compute nodes of a parallel computer},
author = {Faraj, Ahmad},
abstractNote = {Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: establishing, for each node, a plurality of logical rings, each ring including a different set of at least one core on that node, each ring including the cores on at least two of the nodes; iteratively for each node: assigning each core of that node to one of the rings established for that node to which the core has not previously been assigned, and performing, for each ring for that node, a global allreduce operation using contribution data for the cores assigned to that ring or any global allreduce results from previous global allreduce operations, yielding current global allreduce results for each core; and performing, for each node, a local allreduce operation using the global allreduce results.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2013},
month = {7}
}

Works referenced in this record:

Interleaved all-to-all reliable broadcast on meshes and hypercubes
journal, May 1994


Performance analysis and optimization of MPI collective operations on multi-core clusters
journal, April 2009


Computing the Hough transform on a scan line array processor (image processing)
journal, March 1989


Building packet buffers using interleaved memories
conference, January 2005


Bandwidth Efficient All-reduce Operation on Tree Topologies
conference, March 2007


Computing parallel prefix and reduction using coterie structures
conference, January 1992

  • Herbordt, M. C.; Weems, C. C.
  • [1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation
  • https://doi.org/10.1109/FMPC.1992.234895

Towards Efficient Execution of MPI Applications on the Grid: Porting and Optimization Issues
journal, January 2003


Optimizing threaded MPI execution on SMP clusters
conference, January 2001


Abmahnung statt Jobverlust: K√ľndigung
journal, August 2010


Extending the message passing interface (MPI)
conference, January 1995


Efficient MPI Collective Operations for Clusters in Long-and-Fast Networks
conference, September 2006


Efficient algorithms for all-to-all communications in multiport message-passing systems
journal, January 1997


Coprocessor design to support MPI primitives in configurable multiprocessors
journal, April 2007


Optimization of MPI collectives on clusters of large-scale SMP's
conference, January 1999


DADO: A tree-structured machine architecture for production systems
report, March 1982