skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Performing a local reduction operation on a parallel computer

Abstract

A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.

Inventors:
;
Publication Date:
Research Org.:
International Business Machines Corp., Armonk, NY (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1082353
Patent Number(s):
8,332,460
Application Number:
12/760,020
Assignee:
International Business Machines Corporation (Armonk, NY)
DOE Contract Number:  
B554331
Resource Type:
Patent
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Blocksome, Michael A., and Faraj, Daniel A. Performing a local reduction operation on a parallel computer. United States: N. p., 2012. Web.
Blocksome, Michael A., & Faraj, Daniel A. Performing a local reduction operation on a parallel computer. United States.
Blocksome, Michael A., and Faraj, Daniel A. 2012. "Performing a local reduction operation on a parallel computer". United States. https://www.osti.gov/servlets/purl/1082353.
@article{osti_1082353,
title = {Performing a local reduction operation on a parallel computer},
author = {Blocksome, Michael A. and Faraj, Daniel A.},
abstractNote = {A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.},
doi = {},
url = {https://www.osti.gov/biblio/1082353}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Dec 11 00:00:00 EST 2012},
month = {Tue Dec 11 00:00:00 EST 2012}
}

Works referenced in this record:

Extending the message passing interface (MPI)
conference, January 1995


Efficient MPI Collective Operations for Clusters in Long-and-Fast Networks
conference, September 2006


Optimizing threaded MPI execution on SMP clusters
conference, January 2001


Interleaved all-to-all reliable broadcast on meshes and hypercubes
journal, May 1994


An All-Reduce Operation in Star Networks Using All-to-All Broadcast Communication Pattern
book, January 2005


Efficient algorithms for all-to-all communications in multiport message-passing systems
journal, January 1997


Computing parallel prefix and reduction using coterie structures
conference, January 1992

  • Herbordt, M. C.; Weems, C. C.
  • [1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation
  • https://doi.org/10.1109/FMPC.1992.234895

Coprocessor design to support MPI primitives in configurable multiprocessors
journal, April 2007


Optimization of MPI collectives on clusters of large-scale SMP's
conference, January 1999


Universality of mixed action extrapolation formulae
journal, April 2009


Building packet buffers using interleaved memories
conference, January 2005


Computing the Hough transform on a scan line array processor (image processing)
journal, March 1989


DADO: A tree-structured machine architecture for production systems
report, March 1982


Bandwidth Efficient All-reduce Operation on Tree Topologies
conference, March 2007