Performing an allreduce operation using shared memory

Archer, Charles J; Dozsa, Gabor; Ratterman, Joseph D; Smith, Brian E

Advanced Search OptionsAdvanced Search queries use a traditional Term Search. For more info, see our FAQ.

All Fields:

Patent Title:

Abstract:

Assignee:

Inventor(s):

Patent Number:

Patent Classification (CPC):

All Classifications
A - human necessities
A01 - agriculture
A21 - baking
A22 - butchering
A23 - foods or foodstuffs
A24 - tobacco
A41 - wearing apparel
A42 - headwear
A43 - footwear
A44 - haberdashery
A45 - hand or travelling articles
A46 - brushware
A47 - furniture
A61 - medical or veterinary science
A62 - life-saving
A63 - sports
A99 - subject matter not otherwise provided for in this section
B - performing operations
B01 - physical or chemical processes or apparatus in general
B02 - crushing, pulverising, or disintegrating
B03 - separation of solid materials using liquids or using pneumatic tables or jigs
B04 - centrifugal apparatus or machines for carrying-out physical or chemical processes
B05 - spraying or atomising in general
B06 - generating or transmitting mechanical vibrations in general
B07 - separating solids from solids
B08 - cleaning
B09 - disposal of solid waste
B21 - mechanical metal-working without essentially removing material
B22 - casting
B23 - machine tools
B24 - grinding
B25 - hand tools
B26 - hand cutting tools
B27 - working or preserving wood or similar material
B28 - working cement, clay, or stone
B29 - working of plastics
B30 - presses
B31 - making articles of paper, cardboard or material worked in a manner analogous to paper
B32 - layered products
B33 - additive manufacturing technology
B41 - printing
B42 - bookbinding
B43 - writing or drawing implements
B44 - decorative arts
B60 - vehicles in general
B61 - railways
B62 - land vehicles for travelling otherwise than on rails
B63 - ships or other waterborne vessels
B64 - aircraft
B65 - conveying
B66 - hoisting
B67 - opening, closing {or cleaning} bottles, jars or similar containers
B68 - saddlery
B81 - microstructural technology
B82 - nanotechnology
B99 - subject matter not otherwise provided for in this section
C - chemistry
C01 - inorganic chemistry
C02 - treatment of water, waste water, sewage, or sludge
C03 - glass
C04 - cements
C05 - fertilisers
C06 - explosives
C07 - organic chemistry
C08 - organic macromolecular compounds
C09 - dyes
C10 - petroleum, gas or coke industries
C11 - animal or vegetable oils, fats, fatty substances or waxes
C12 - biochemistry
C13 - sugar industry
C14 - skins
C21 - metallurgy of iron
C22 - metallurgy
C23 - coating metallic material
C25 - electrolytic or electrophoretic processes
C30 - crystal growth
C40 - combinatorial technology
C99 - subject matter not otherwise provided for in this section
D - textiles
D01 - natural or man-made threads or fibres
D02 - yarns
D03 - weaving
D04 - braiding
D05 - sewing
D06 - treatment of textiles or the like
D07 - ropes
D10 - indexing scheme associated with sublasses of section d, relating to textiles
D21 - paper-making
D99 - subject matter not otherwise provided for in this section
E - fixed constructions
E01 - construction of roads, railways, or bridges
E02 - hydraulic engineering
E03 - water supply
E04 - building
E05 - locks
E06 - doors, windows, shutters, or roller blinds in general
E21 - earth drilling
E99 - subject matter not otherwise provided for in this section
F - mechanical engineering
F01 - machines or engines in general
F02 - combustion engines
F03 - machines or engines for liquids
F04 - positive - displacement machines for liquids
F05 - indexing schemes relating to engines or pumps in various subclasses of classes f01-f04
F15 - fluid-pressure actuators
F16 - engineering elements and units
F17 - storing or distributing gases or liquids
F21 - lighting
F22 - steam generation
F23 - combustion apparatus
F24 - heating
F25 - refrigeration or cooling
F26 - drying
F27 - furnaces
F28 - heat exchange in general
F41 - weapons
F42 - ammunition
F99 - subject matter not otherwise provided for in this section
G - physics
G01 - measuring
G02 - optics
G03 - photography
G04 - horology
G05 - controlling
G06 - computing
G07 - checking-devices
G08 - signalling
G09 - education
G10 - musical instruments
G11 - information storage
G12 - instrument details
G16 - information and communication technology [ict] specially adapted for specific application fields
G21 - nuclear physics
G99 - subject matter not otherwise provided for in this section
H - electricity
H01 - basic electric elements
H02 - generation
H03 - basic electronic circuitry
H04 - electric communication technique
H05 - electric techniques not otherwise provided for
H99 - subject matter not otherwise provided for in this section
Y - new / cross sectional technologies
Y02 - technologies or applications for mitigation or adaptation against climate change
Y04 - information or communication technologies having an impact on other technology areas
Y10 - technical subjects covered by former uspc

More Options ...

Title: Performing an allreduce operation using shared memory

Abstract

Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.

Inventors:

Archer, Charles J ^[1]; Dozsa, Gabor ^[2]; Ratterman, Joseph D ^[1]; Smith, Brian E ^[1]

Rochester, MN
Ardsley, NY

Issue Date:: Tue Apr 17 00:00:00 EDT 2012

Research Org.:: International Business Machines Corp., Armonk, NY (United States)

Sponsoring Org.:: USDOE

OSTI Identifier:: 1040779

Patent Number(s):: 8161480

Application Number:: 11/754,782

Assignee:: International Business Machines Corporation (Armonk, NY)

Patent Classifications (CPCs):: G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING

Show more

G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
G06F9/4843 - {by program, e.g. task dispatcher, supervisor, operating system}
G06F9/52 - Program synchronisation
G06F9/546 - {Message passing systems or structures, e.g. queues}

Show less

DOE Contract Number:: B554331

Resource Type:: Patent

Resource Relation:: Patent File Date: 2007 May 29

Country of Publication:: United States

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING

Citation Formats


                    Archer, Charles J, Dozsa, Gabor, Ratterman, Joseph D, and Smith, Brian E. Performing an allreduce operation using shared memory.  United States: N. p., 2012. 
        Web.

Copy to clipboard


                    Archer, Charles J, Dozsa, Gabor, Ratterman, Joseph D, & Smith, Brian E. Performing an allreduce operation using shared memory.  United States.

Copy to clipboard


                    Archer, Charles J, Dozsa, Gabor, Ratterman, Joseph D, and Smith, Brian E. Tue .  
        "Performing an allreduce operation using shared memory".  United States.  https://www.osti.gov/servlets/purl/1040779.

Copy to clipboard


                    
@article{osti_1040779,

  title        = {Performing an allreduce operation using shared memory},

  author       = {Archer, Charles J and Dozsa, Gabor and Ratterman, Joseph D and Smith, Brian E},

  abstractNote = {Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.},

  doi          = {},

  journal      = {},
number       = ,

  volume       = ,

  place        = {United States},

  year         = {Tue Apr 17 00:00:00 EDT 2012},

  month        = {Tue Apr 17 00:00:00 EDT 2012}

}

Copy to clipboard

Patent:

Save / Share:

Export Metadata

Save to My Library

Works referenced in this record:

Extending the message passing interface (MPI)
conference, January 1995

Skjellum, A.; Doss, N. E.; Viswanathan, K.
Proceedings Scalable Parallel Libraries Conference
https://doi.org/10.1109/SPLC.1994.376998

Efficient MPI Collective Operations for Clusters in Long-and-Fast Networks
conference, September 2006

Matsuda, Motohiko; Kudoh, Tomohiro; Kodama, Yuetsu
2006 IEEE International Conference on Cluster Computing
https://doi.org/10.1109/CLUSTR.2006.311848

Interleaved all-to-all reliable broadcast on meshes and hypercubes
journal, May 1994

Sunggu Lee, ; Shin, K. G.
IEEE Transactions on Parallel and Distributed Systems, Vol. 5, Issue 5
https://doi.org/10.1109/71.282556

Optimizing threaded MPI execution on SMP clusters
conference, January 2001

Tang, Hong; Yang, Tao
Proceedings of the 15th international conference on Supercomputing - ICS '01
https://doi.org/10.1145/377792.377895

An All-Reduce Operation in Star Networks Using All-to-All Broadcast Communication Pattern
book, January 2005

Oh, Eunseuk; Choi, Hongsik; Primeaux, David
Lecture Notes in Computer Science
https://doi.org/10.1007/11428831_52

Efficient algorithms for all-to-all communications in multiport message-passing systems
journal, January 1997

Bruck, J.; Kipnis, S.
IEEE Transactions on Parallel and Distributed Systems, Vol. 8, Issue 11
https://doi.org/10.1109/71.642949

Computing parallel prefix and reduction using coterie structures
conference, January 1992

Herbordt, M. C.; Weems, C. C.
[1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation
https://doi.org/10.1109/FMPC.1992.234895

Optimization of MPI collectives on clusters of large-scale SMP's
conference, January 1999

Sistare, Steve; vandeVaart, Rolf; Loh, Eugene
Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99
https://doi.org/10.1145/331532.331555

Universality of mixed action extrapolation formulae
journal, April 2009

Chen, Jiunn-Wei; Walker-Loud, André; O'Connell, Donal
Journal of High Energy Physics, Vol. 2009, Issue 04
https://doi.org/10.1088/1126-6708/2009/04/090

Coprocessor design to support MPI primitives in configurable multiprocessors
journal, April 2007

Ziavras, Sotirios G.; Gerbessiotis, Alexandros V.; Bafna, Rohan
Integration, the VLSI Journal, Vol. 40, Issue 3, p. 235-252
https://doi.org/10.1016/j.vlsi.2005.10.001

Computing the Hough transform on a scan line array processor (image processing)
journal, March 1989

Fisher, A. L.; Highnam, P. T.
IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, Issue 3
https://doi.org/10.1109/34.21795

DADO: A tree-structured machine architecture for production systems
report, March 1982

Stolfo, Salvatore; Shaw, David Elliot
Columbia University, 15 p.
CUCS-24-82
https://doi.org/10.7916/D8SQ97CN

Bandwidth Efficient All-reduce Operation on Tree Topologies
conference, March 2007

Patarasuk, Pitch; Yuan, Xin
2007 IEEE International Parallel and Distributed Processing Symposium
https://doi.org/10.1109/IPDPS.2007.370405

Similar Records in DOE Patents and OSTI.GOV collections:

Performing an allreduce operation using shared memory

Patent Archer, Charles J; Dozsa, Gabor; Ratterman, Joseph D; ...

Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing,more » « less
Full Text Available
Performing an allreduce operation on a plurality of compute nodes of a parallel computer

Patent Faraj, Ahmad

Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: performing, for each node, a local reduction operation using allreduce contribution data for the cores of that node, yielding, for each node, a local reduction result for one or more representative cores for that node; establishing one or more logical rings among the nodes, each logical ring including only one of the representative cores from each node; performing, for each logical ring, a global allreduce operation using the localmore » « less
Full Text Available
Performing an allreduce operation on a plurality of compute nodes of a parallel computer

Patent Faraj, Ahmad [Rochester, MN]

Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring,more » « less
Full Text Available
Performing an allreduce operation on a plurality of compute nodes of a parallel computer

Patent Faraj, Ahmad

Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: establishing, for each node, a plurality of logical rings, each ring including a different set of at least one core on that node, each ring including the cores on at least two of the nodes; iteratively for each node: assigning each core of that node to one of the rings established for that node to which the core has not previously been assigned, and performing, for each ring formore » « less
Full Text Available
Performing process migration with allreduce operations

Patent Archer, Charles Jens [Rochester, MN]; Peters, Amanda [Rochester, MN]; Wallenfelt, Brian Paul [Plymouth, MN]

Compute nodes perform allreduce operations that swap processes at nodes. A first allreduce operation generates a first result and uses a first process from a first compute node, a second process from a second compute node, and zeros from other compute nodes. The first compute node replaces the first process with the first result. A second allreduce operation generates a second result and uses the first result from the first compute node, the second process from the second compute node, and zeros from others. The second compute node replaces the second process with the second result, which is the firstmore » « less
Full Text Available

Similar Records

Title: Performing an allreduce operation using shared memory

Abstract

Citation Formats

Extending the message passing interface (MPI) conference, January 1995

Efficient MPI Collective Operations for Clusters in Long-and-Fast Networks conference, September 2006

Interleaved all-to-all reliable broadcast on meshes and hypercubes journal, May 1994

Optimizing threaded MPI execution on SMP clusters conference, January 2001

An All-Reduce Operation in Star Networks Using All-to-All Broadcast Communication Pattern book, January 2005

Efficient algorithms for all-to-all communications in multiport message-passing systems journal, January 1997

Computing parallel prefix and reduction using coterie structures conference, January 1992

Optimization of MPI collectives on clusters of large-scale SMP's conference, January 1999

Universality of mixed action extrapolation formulae journal, April 2009

Coprocessor design to support MPI primitives in configurable multiprocessors journal, April 2007

Computing the Hough transform on a scan line array processor (image processing) journal, March 1989

DADO: A tree-structured machine architecture for production systems report, March 1982

Bandwidth Efficient All-reduce Operation on Tree Topologies conference, March 2007

Extending the message passing interface (MPI)
conference, January 1995

Efficient MPI Collective Operations for Clusters in Long-and-Fast Networks
conference, September 2006

Interleaved all-to-all reliable broadcast on meshes and hypercubes
journal, May 1994

Optimizing threaded MPI execution on SMP clusters
conference, January 2001

An All-Reduce Operation in Star Networks Using All-to-All Broadcast Communication Pattern
book, January 2005

Efficient algorithms for all-to-all communications in multiport message-passing systems
journal, January 1997

Computing parallel prefix and reduction using coterie structures
conference, January 1992

Optimization of MPI collectives on clusters of large-scale SMP's
conference, January 1999

Universality of mixed action extrapolation formulae
journal, April 2009

Coprocessor design to support MPI primitives in configurable multiprocessors
journal, April 2007

Computing the Hough transform on a scan line array processor (image processing)
journal, March 1989

DADO: A tree-structured machine architecture for production systems
report, March 1982

Bandwidth Efficient All-reduce Operation on Tree Topologies
conference, March 2007