DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Performing an allreduce operation on a plurality of compute nodes of a parallel computer

Abstract

Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.

Inventors:
 [1]
  1. Rochester, MN
Issue Date:
Research Org.:
International Business Machines Corp., Armonk, NY (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1040781
Patent Number(s):
8161268
Application Number:
12/124,756
Assignee:
International Business Machines Corporation (Armonk, NY)
Patent Classifications (CPCs):
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
DOE Contract Number:  
B554331
Resource Type:
Patent
Resource Relation:
Patent File Date: 2008 May 21
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Faraj, Ahmad. Performing an allreduce operation on a plurality of compute nodes of a parallel computer. United States: N. p., 2012. Web.
Faraj, Ahmad. Performing an allreduce operation on a plurality of compute nodes of a parallel computer. United States.
Faraj, Ahmad. Tue . "Performing an allreduce operation on a plurality of compute nodes of a parallel computer". United States. https://www.osti.gov/servlets/purl/1040781.
@article{osti_1040781,
title = {Performing an allreduce operation on a plurality of compute nodes of a parallel computer},
author = {Faraj, Ahmad},
abstractNote = {Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2012},
month = {4}
}

Works referenced in this record:

Adaptive Model Trust Region Methods for Generalized Eigenvalue Problems
book, January 2005


Extending the message passing interface (MPI)
conference, January 1995


Efficient MPI Collective Operations for Clusters in Long-and-Fast Networks
conference, September 2006


Interleaved all-to-all reliable broadcast on meshes and hypercubes
journal, May 1994


Optimizing threaded MPI execution on SMP clusters
conference, January 2001


Efficient algorithms for all-to-all communications in multiport message-passing systems
journal, January 1997


Computing parallel prefix and reduction using coterie structures
conference, January 1992

  • Herbordt, M. C.; Weems, C. C.
  • [1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation
  • https://doi.org/10.1109/FMPC.1992.234895

Optimization of MPI collectives on clusters of large-scale SMP's
conference, January 1999


Universality of mixed action extrapolation formulae
journal, April 2009


Coprocessor design to support MPI primitives in configurable multiprocessors
journal, April 2007


Computing the Hough transform on a scan line array processor (image processing)
journal, March 1989


DADO: A tree-structured machine architecture for production systems
report, March 1982


Bandwidth Efficient All-reduce Operation on Tree Topologies
conference, March 2007