skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Performing an allreduce operation on a plurality of compute nodes of a parallel computer

Patent ·
OSTI ID:1087901

Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: establishing, for each node, a plurality of logical rings, each ring including a different set of at least one core on that node, each ring including the cores on at least two of the nodes; iteratively for each node: assigning each core of that node to one of the rings established for that node to which the core has not previously been assigned, and performing, for each ring for that node, a global allreduce operation using contribution data for the cores assigned to that ring or any global allreduce results from previous global allreduce operations, yielding current global allreduce results for each core; and performing, for each node, a local allreduce operation using the global allreduce results.

Research Organization:
International Business Machines Corp., Armonk, NY (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
B554331
Assignee:
International Business Machines Corporation (Armonk, NY)
Patent Number(s):
8,484,440
Application Number:
12/124,745
OSTI ID:
1087901
Country of Publication:
United States
Language:
English

References (152)

Root node redundancy for multipoint-to-multipoint transport trees patent November 2010
Performing An All-To-All Data Exchange On A Plurality Of Data Buffers By Performing Swap Operations patent-application January 2010
Data transfer apparatus and method patent October 2005
Non-Binary Source-to-Channel Symbol Mappings with Minimized Distortion patent-application August 2009
Method, system and computer program product for managing memory in a non-uniform memory access system patent September 2001
Method and system for pre-pending layer 2 (L2) frame descriptors patent-application June 2005
Performing A Scatterv Operation On A Hierarchical Tree Network Optimized For Collective Operations patent-application September 2011
Correlating Hardware Devices Between Local Operating System and Global Management Entity patent-application August 2008
Method and apparatus for manifold array processing patent December 2000
Locating hardware faults in a parallel computer patent April 2010
Method and apparatus for pre-provisioning networks to support fast restoration with minimum overbuild patent-application November 2005
Interleaved all-to-all reliable broadcast on meshes and hypercubes journal May 1994
Executing a Scatter Operation on a Parallel Computer patent-application October 2008
Class network routing patent September 2009
Performance analysis and optimization of MPI collective operations on multi-core clusters journal April 2009
Direct Memory Access ('DMA') Engine Assisted Local Reduction patent-application January 2009
Broadcasting A Message In A Parallel Computer patent-application September 2009
Performing an allreduce operation on a plurality of compute nodes of a parallel computer patent April 2012
Computing the Hough transform on a scan line array processor (image processing) journal March 1989
Method for testing integrated memory using an integrated DMA controller patent September 1997
DMA descriptor queue read and cache write pointer arrangement patent February 2009
Method and apparatus for stacked address, bus to memory data transfer patent June 2010
Dead reckoning routing of packet data within a network of nodes having generally regular topology patent August 2001
Partitioning of processing elements in a SIMD/MIMD array processor patent March 1999
Performing process migration with allreduce operations patent December 2010
Apparatus and methods for connecting modules using remote switching patent-application February 2002
Hierarchical interconnection network architecture for parallel processing, having interconnections between bit-addressible nodes based on address bit permutations patent April 1996
Dynamic load balancing among processors in a parallel computer patent September 2001
Input/output controller for coupling the processor-memory complex to the fabric in fabric-backplane interprise servers patent February 2010
Scalable system control unit for distributed shared memory multi-processor systems patent April 2002
Performing An Allreduce Operation On A Plurality Of Compute Nodes Of A Parallel Computer patent-application November 2009
Deterministic real time hierarchical distributed computing system patent August 2007
Protocol for self-organizing network using a logical spanning tree backbone patent January 2006
System and method for configuring computer applications and devices using inheritance patent-application July 2002
Lingering locks with fairness control for multi-node computer systems patent November 2002
Building packet buffers using interleaved memories conference January 2005
Computer hardware fault administration patent September 2010
Performing an Allreduce Operation Using Shared Memory patent-application December 2008
Performing A Deterministic Reduction Operation In A Parallel Computer patent-application December 2011
Adaptive congestion control mechanism for modular computer networks patent September 1999
Optimizing Collective Operations patent-application November 2011
Bandwidth Efficient All-reduce Operation on Tree Topologies conference March 2007
System and method for generating object code for map-reduce idioms in multiprocessor systems patent-application May 2008
Multiprocessor computer system with interleaved processing element nodes patent April 1998
Apparatus and method for controlling direct memory access patent-application August 2006
Apparatus, system, and method for reliable, fast, and scalable multicast message delivery in service overlay networks patent-application May 2007
Signaling completion of a message transfer from an origin compute node to a target compute node patent May 2011
Method and apparatus for storing tree data structures among and within multiple memory channels patent April 2008
Parallel computing system patent December 1999
Multi-use data access descriptor patent-application October 2002
Parallel computer system using properties of messages to route them through an interconnect network and to select virtual channel circuits therewithin patent April 1999
Configuring compute nodes of a parallel computer in an operational group into a plurality of independent non-overlapping collective networks patent March 2010
Arithmetic functions in torus and tree networks patent-application April 2004
Optimized Collectives Using a DMA on a Parallel Computer patent-application January 2009
Cross-Channel Network Operation Offloading for Collective Operations patent-application May 2011
Method for performing alltoall communication in parallel computers patent December 2001
Video output controller and video card patent July 2005
Parallel processing method and system using a lazy parallel data type to reduce inter-processor communication patent April 2001
High-speed, parallel, processor architecture for front-end electronics, based on a single type of ASIC, and method use thereof patent August 1999
Inter-computer message routing system with each computer having separate routinng automata for each dimension of the network patent April 1992
Send-Side Matching Of Data Communications Messages patent-application March 2012
System for allocating computing resources of distributed computer system with transaction manager patent September 2009
Cluster Computing Support for Application Programs patent-application December 2007
Executing a Gather Operation on a Parallel Computer patent-application October 2010
Performing An Allreduce Operation Using Shared Memory patent-application July 2012
Computer Hardware Fault Diagnosis patent-application October 2007
Administering an Epoch Initiated for Remote Memory Access patent-application December 2008
Executing an Allgather Operation on a Parallel Computer patent-application October 2007
Processing Data Communications Events In A Parallel Active Messaging Interface Of A Parallel Computer patent-application May 2012
Broadcast invalidate scheme patent-application April 2004
Executing an Allgather Operation with an Alltoallv Operation in a Parallel Computer patent-application January 2008
Irregular network patent-application November 2003
Hexagonal mesh multiprocessor system patent March 1992
Performing A Local Reduction Operation On A Parallel Computer patent-application October 2011
Computing parallel prefix and reduction using coterie structures
  • Herbordt, M. C.; Weems, C. C.
  • [1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation https://doi.org/10.1109/FMPC.1992.234895
conference January 1992
Distributed processing multi-processor computer patent-application September 2003
Data gathering/scattering system for a plurality of processors in a parallel computer patent November 1998
Adaptive Address Mapping with Dynamic Runtime Memory Mapping Selection patent-application June 2011
Pattern generation and shift plane operations for a mesh connected computer patent May 2000
Hybrid hypercube/torus architecture patent May 2001
Method, System, and Program for Handling Input/Output Commands patent-application July 2006
Parallel Programming Development Environment patent-application May 2002
Recording A Communication Pattern and Replaying Messages in a Parallel Computing System patent-application January 2011
Tracking Network Contention patent-application June 2009
Direct memory access controller system with message-based programming patent-application July 2005
Towards Efficient Execution of MPI Applications on the Grid: Porting and Optimization Issues journal January 2003
Routing resource reserve/release protocol for multi-processor computer systems patent June 2000
Method and apparatus for storing tree data structures among and within multiple memory channels patent November 2009
Method and apparatus for internetworking buffer management patent August 2000
Performing An Allreduce Operation On A Plurality Of Compute Nodes Of A Parallel Computer patent-application December 2009
Performing Process Migration with Allreduce Operations patent-application July 2010
Data transfer apparatus and method patent-application October 2003
Effecting a Broadcast with an Allreduce Operation on a Parallel Computer patent-application February 2009
Central shared queue based time multiplexed packet switch with deadlock avoidance patent September 1998
Facilitating intra-node data transfer in collective communications patent May 2009
Virtual private networks within a packet network having a mesh topology patent-application May 2005
System-On-A-Chip Having an Array of Programmable Processing Elements Linked By an On-Chip Network with Distributed On-Chip Shared Memory and External Shared Memory patent-application July 2010
Fast restoration mechanism and method of determining minimum restoration capacity in a transmission networks patent November 2006
Systems for communicating current and future activity information among mobile internet users and methods therefor patent October 2007
Handling potential deadlocks and correctness problems of reduce operations in parallel systems patent-application March 2009
Optimizing threaded MPI execution on SMP clusters conference January 2001
Method and apparatus for the connection of a closed ring through a telephone exchange patent December 1987
Administering Communications Schedules for Data Communications Among Compute Nodes in a Data Communications Network of a Parallel Computer patent-application April 2009
SMP clusters with remote resource managers for distributing work to other clusters while reducing bus traffic to a minimum patent March 2000
Managing Hardware Resources by Sending Messages Amongst Servers in a Data Center patent-application July 2011
Broadcasting A Message In A Parallel Computer patent-application October 2009
Performing A Deterministic Reduction Operation In A Parallel Computer patent-application December 2011
Direct memory access transfer reduction method and apparatus to overlay data on to scatter gather descriptors for bus-mastering I/O controllers patent November 2003
Method and apparatus for controlling (N+I) I/O channels with (N) data managers in a homogenous software programmable environment patent January 1999
Communications network patent March 2004
Abmahnung statt Jobverlust: Kündigung journal August 2010
Method and apparatus for wire speed IP multicast forwarding patent June 2004
Parallel processor system having computing clusters and auxiliary clusters connected with network of partial networks and exchangers patent December 1994
Efficient circuits for out-of-order microprocessors patent-application February 2004
Reliable datagram transport service patent January 2007
Self-timed mesh routing chip with data broadcasting patent July 1994
Extending the message passing interface (MPI) conference January 1995
Efficient MPI Collective Operations for Clusters in Long-and-Fast Networks conference September 2006
Dynamically matching users for group communications based on a threshold degree of matching of sender and recipient predetermined acceptance criteria patent November 2002
Apparatus and method for capacity planning for data center server consolidation and workload reassignment patent-application March 2008
Efficient algorithms for all-to-all communications in multiport message-passing systems journal January 1997
Broadcasting Collective Operation Contributions Throughout A Parallel Computer patent-application September 2009
Novel Massively Parallel Supercomputer patent-application October 2009
Parallel processing method patent June 1989
Prediction system for RF power distribution patent September 1999
Coprocessor design to support MPI primitives in configurable multiprocessors journal April 2007
Communicator-based token/buffer management for eager protocol support in collective communication operations patent-application May 2010
Message transfer system and method for parallel computer with message transfers being scheduled by skew and roll functions to avoid bottlenecks patent April 1997
Database system providing optimization of group by operator over a union all patent February 2004
Mechanism For Process Migration On A Massively Parallel Computer patent-application March 2009
Method of optimizing recognition of collective data movement in a parallel distributed system patent October 1998
Multicomputer memory access architecture patent February 1998
Parallel processor system with a broadcast message serializing circuit provided within a network patent October 1998
Non-Volatile Memory And Method With Non-Sequential Update Block Management patent-application January 2009
Method and apparatus for zeroing a transfer buffer memory as a background task patent-application May 2002
Performing An All-To-All Data Exchange On A Plurality Of Data Buffers By Performing Swap Operations patent-application August 2012
System and method for configuration, management, and monitoring of a computer network using inheritance patent December 2004
Memory control device patent April 2010
Manifold array processor patent March 2007
Development of parallel/distributed applications patent-application December 2006
Effecting Hardware Acceleration Of Broadcast Operations In A Parallel Computer patent-application November 2011
Optimization of MPI collectives on clusters of large-scale SMP's conference January 1999
Method, system and program product for communicating among processes in a symmetric multi-processing cluster environment patent-application July 2007
Parallel-Prefix Broadcast for a Parallel-Prefix Operation on a Parallel Computer patent-application October 2008
Line-Plane Broadcasting in a Data Communications Network of a Parallel Computer patent-application February 2009
Packet-switched self-routing multistage interconnection network having contention-free fanout, low-loss routing, and fanin buffering to efficiently realize arbitrarily low packet loss patent July 1996
Method and apparatus for efficient transfer of data packets patent May 2004
DADO: A tree-structured machine architecture for production systems report March 1982
Massively parallel supercomputer patent June 2009
Executing an Allgather Operation on a Parallel Computer patent-application February 2009
System and method for automatic generation of a hierarchical tree network and the use of two complementary learning algorithms, optimized for each leaf of the hierarchical tree network patent May 2010
Line-Plane Broadcasting in a Data Communications Network of a Parallel Computer patent-application February 2009