Performing an allreduce operation on a plurality of compute nodes of a parallel computer
Abstract
Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.
- Inventors:
-
- Rochester, MN
- Issue Date:
- Research Org.:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1040781
- Patent Number(s):
- 8161268
- Application Number:
- 12/124,756
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- B554331
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 2008 May 21
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Faraj, Ahmad. Performing an allreduce operation on a plurality of compute nodes of a parallel computer. United States: N. p., 2012.
Web.
Faraj, Ahmad. Performing an allreduce operation on a plurality of compute nodes of a parallel computer. United States.
Faraj, Ahmad. Tue .
"Performing an allreduce operation on a plurality of compute nodes of a parallel computer". United States. https://www.osti.gov/servlets/purl/1040781.
@article{osti_1040781,
title = {Performing an allreduce operation on a plurality of compute nodes of a parallel computer},
author = {Faraj, Ahmad},
abstractNote = {Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2012},
month = {4}
}
Works referenced in this record:
Adaptive Model Trust Region Methods for Generalized Eigenvalue Problems
book, January 2005
- Absil, P. -A.; Baker, C. G.; Gallivan, K. A.
- Lecture Notes in Computer Science
Extending the message passing interface (MPI)
conference, January 1995
- Skjellum, A.; Doss, N. E.; Viswanathan, K.
- Proceedings Scalable Parallel Libraries Conference
Efficient MPI Collective Operations for Clusters in Long-and-Fast Networks
conference, September 2006
- Matsuda, Motohiko; Kudoh, Tomohiro; Kodama, Yuetsu
- 2006 IEEE International Conference on Cluster Computing
Interleaved all-to-all reliable broadcast on meshes and hypercubes
journal, May 1994
- Sunggu Lee, ; Shin, K. G.
- IEEE Transactions on Parallel and Distributed Systems, Vol. 5, Issue 5
Optimizing threaded MPI execution on SMP clusters
conference, January 2001
- Tang, Hong; Yang, Tao
- Proceedings of the 15th international conference on Supercomputing - ICS '01
Efficient algorithms for all-to-all communications in multiport message-passing systems
journal, January 1997
- Bruck, J.; Kipnis, S.
- IEEE Transactions on Parallel and Distributed Systems, Vol. 8, Issue 11
Computing parallel prefix and reduction using coterie structures
conference, January 1992
- Herbordt, M. C.; Weems, C. C.
- [1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation
Optimization of MPI collectives on clusters of large-scale SMP's
conference, January 1999
- Sistare, Steve; vandeVaart, Rolf; Loh, Eugene
- Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99
Universality of mixed action extrapolation formulae
journal, April 2009
- Chen, Jiunn-Wei; Walker-Loud, André; O'Connell, Donal
- Journal of High Energy Physics, Vol. 2009, Issue 04
Coprocessor design to support MPI primitives in configurable multiprocessors
journal, April 2007
- Ziavras, Sotirios G.; Gerbessiotis, Alexandros V.; Bafna, Rohan
- Integration, the VLSI Journal, Vol. 40, Issue 3, p. 235-252
Computing the Hough transform on a scan line array processor (image processing)
journal, March 1989
- Fisher, A. L.; Highnam, P. T.
- IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, Issue 3
DADO: A tree-structured machine architecture for production systems
report, March 1982
- Stolfo, Salvatore; Shaw, David Elliot
- Columbia University, 15 p.
- CUCS-24-82
Bandwidth Efficient All-reduce Operation on Tree Topologies
conference, March 2007
- Patarasuk, Pitch; Yuan, Xin
- 2007 IEEE International Parallel and Distributed Processing Symposium