Performing a local reduction operation on a parallel computer
Abstract
A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
- Inventors:
- Issue Date:
- Research Org.:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1084349
- Patent Number(s):
- 8458244
- Application Number:
- 13/585,993
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- B554331
- Resource Type:
- Patent
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Blocksome, Michael A, and Faraj, Daniel A. Performing a local reduction operation on a parallel computer. United States: N. p., 2013.
Web.
Blocksome, Michael A, & Faraj, Daniel A. Performing a local reduction operation on a parallel computer. United States.
Blocksome, Michael A, and Faraj, Daniel A. Tue .
"Performing a local reduction operation on a parallel computer". United States. https://www.osti.gov/servlets/purl/1084349.
@article{osti_1084349,
title = {Performing a local reduction operation on a parallel computer},
author = {Blocksome, Michael A and Faraj, Daniel A},
abstractNote = {A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2013},
month = {6}
}
Works referenced in this record:
Interleaved all-to-all reliable broadcast on meshes and hypercubes
journal, May 1994
- Sunggu Lee, ; Shin, K. G.
- IEEE Transactions on Parallel and Distributed Systems, Vol. 5, Issue 5
Computing the Hough transform on a scan line array processor (image processing)
journal, March 1989
- Fisher, A. L.; Highnam, P. T.
- IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, Issue 3
Building packet buffers using interleaved memories
conference, January 2005
- Shrimali, G.; McKeown, N.
- HPSR. 2005 Workshop on High Performance Switching and Routing, 2005.
Bandwidth Efficient All-reduce Operation on Tree Topologies
conference, March 2007
- Patarasuk, Pitch; Yuan, Xin
- 2007 IEEE International Parallel and Distributed Processing Symposium
Computing parallel prefix and reduction using coterie structures
conference, January 1992
- Herbordt, M. C.; Weems, C. C.
- [1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation
Towards Efficient Execution of MPI Applications on the Grid: Porting and Optimization Issues
journal, January 2003
- Keller, Rainer; Gabriel, Edgar; Krammer, Bettina
- Journal of Grid Computing, Vol. 1, Issue 2
Optimizing threaded MPI execution on SMP clusters
conference, January 2001
- Tang, Hong; Yang, Tao
- Proceedings of the 15th international conference on Supercomputing - ICS '01
Extending the message passing interface (MPI)
conference, January 1995
- Skjellum, A.; Doss, N. E.; Viswanathan, K.
- Proceedings Scalable Parallel Libraries Conference
Efficient MPI Collective Operations for Clusters in Long-and-Fast Networks
conference, September 2006
- Matsuda, Motohiko; Kudoh, Tomohiro; Kodama, Yuetsu
- 2006 IEEE International Conference on Cluster Computing
Efficient algorithms for all-to-all communications in multiport message-passing systems
journal, January 1997
- Bruck, J.; Kipnis, S.
- IEEE Transactions on Parallel and Distributed Systems, Vol. 8, Issue 11
Coprocessor design to support MPI primitives in configurable multiprocessors
journal, April 2007
- Ziavras, Sotirios G.; Gerbessiotis, Alexandros V.; Bafna, Rohan
- Integration, the VLSI Journal, Vol. 40, Issue 3, p. 235-252
Optimization of MPI collectives on clusters of large-scale SMP's
conference, January 1999
- Sistare, Steve; vandeVaart, Rolf; Loh, Eugene
- Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99
DADO: A tree-structured machine architecture for production systems
report, March 1982
- Stolfo, Salvatore; Shaw, David Elliot
- Columbia University, 15 p.
- CUCS-24-82