Performing an allreduce operation on a plurality of compute nodes of a parallel computer

Faraj, Ahmad

Title: Performing an allreduce operation on a plurality of compute nodes of a parallel computer

Patent · Tue Apr 17 00:00:00 EDT 2012

OSTI ID:1040781

Faraj, Ahmad ^[1]

Rochester, MN

Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.

View Patent

Cite

Export

Save

Research Organization:: International Business Machines Corp., Armonk, NY (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: B554331

Assignee:: International Business Machines Corporation (Armonk, NY)

Patent Number(s):: 8,161,268

Application Number:: 12/124,756

OSTI ID:: 1040781

Resource Relation:: Patent File Date: 2008 May 21

Country of Publication:: United States

Language:: English

References (13)

Adaptive Model Trust Region Methods for Generalized Eigenvalue Problems Absil, P. -A.; Baker, C. G.; Gallivan, K. A. Lecture Notes in Computer Science https://doi.org/10.1007/11428831_5	book	January 2005
Extending the message passing interface (MPI) Skjellum, A.; Doss, N. E.; Viswanathan, K. Proceedings Scalable Parallel Libraries Conference https://doi.org/10.1109/SPLC.1994.376998	conference	January 1995
Efficient MPI Collective Operations for Clusters in Long-and-Fast Networks Matsuda, Motohiko; Kudoh, Tomohiro; Kodama, Yuetsu 2006 IEEE International Conference on Cluster Computing https://doi.org/10.1109/CLUSTR.2006.311848	conference	September 2006
Interleaved all-to-all reliable broadcast on meshes and hypercubes IEEE Transactions on Parallel and Distributed Systems, Vol. 5, Issue 5 https://doi.org/10.1109/71.282556	journal	May 1994
Optimizing threaded MPI execution on SMP clusters Tang, Hong; Yang, Tao Proceedings of the 15th international conference on Supercomputing - ICS '01 https://doi.org/10.1145/377792.377895	conference	January 2001
Efficient algorithms for all-to-all communications in multiport message-passing systems Bruck, J.; Kipnis, S. IEEE Transactions on Parallel and Distributed Systems, Vol. 8, Issue 11 https://doi.org/10.1109/71.642949	journal	January 1997
Computing parallel prefix and reduction using coterie structures Herbordt, M. C.; Weems, C. C. [1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation https://doi.org/10.1109/FMPC.1992.234895	conference	January 1992
Optimization of MPI collectives on clusters of large-scale SMP's Sistare, Steve; vandeVaart, Rolf; Loh, Eugene Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99 https://doi.org/10.1145/331532.331555	conference	January 1999
Universality of mixed action extrapolation formulae Chen, Jiunn-Wei; Walker-Loud, André; O'Connell, Donal Journal of High Energy Physics, Vol. 2009, Issue 04 https://doi.org/10.1088/1126-6708/2009/04/090	journal	April 2009
Coprocessor design to support MPI primitives in configurable multiprocessors Ziavras, Sotirios G.; Gerbessiotis, Alexandros V.; Bafna, Rohan Integration, the VLSI Journal, Vol. 40, Issue 3, p. 235-252 https://doi.org/10.1016/j.vlsi.2005.10.001	journal	April 2007
Computing the Hough transform on a scan line array processor (image processing) Fisher, A. L.; Highnam, P. T. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, Issue 3 https://doi.org/10.1109/34.21795	journal	March 1989
DADO: A tree-structured machine architecture for production systems Stolfo, Salvatore; Shaw, David Elliot Columbia University, 15 p. https://doi.org/10.7916/D8SQ97CN CUCS-24-82	report	March 1982
Bandwidth Efficient All-reduce Operation on Tree Topologies Patarasuk, Pitch; Yuan, Xin 2007 IEEE International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2007.370405	conference	March 2007

Similar Records

Performing an allreduce operation on a plurality of compute nodes of a parallel computer

Patent · Tue Jul 09 00:00:00 EDT 2013 · OSTI ID:1040781

Faraj, Ahmad

Performing an allreduce operation on a plurality of compute nodes of a parallel computer

Patent · Tue Feb 12 00:00:00 EST 2013 · OSTI ID:1040781

Faraj, Ahmad

Effecting a broadcast with an allreduce operation on a parallel computer

Patent · Tue Nov 02 00:00:00 EDT 2010 · OSTI ID:1040781

Almasi, Gheorghe; Archer, Charles J; Ratterman, Joseph D; +1 more

Related Subjects

97 MATHEMATICS AND COMPUTING

Title: Performing an allreduce operation on a plurality of compute nodes of a parallel computer

Citation Formats

References (13)

Similar Records

Related Subjects