Contention-free Routing for Shift-based Communication in MPI Applications on Large-scale Infiniband Clusters
Shift-based communication can be defined as follows. For a set of N nodes assigned to a job, assign each node an ID from 0 to N - 1. Shift-based communication involves N - 1 steps. Let D denote the current step, and let D iterate from 1 to N - 1. Then in step D all nodes choose nodes to send to and receive from such that a node with ID = I sends to the node with ID = (I + D)%N and receives from the node with ID = (I - D + N)%N, where % denotes modulo division. Figure 1 illustrates the communication patterns for various steps in shift-based communication. Shift-based communication patterns enable all nodes in a job to send and receive data simultaneously in all steps. Many MPI operations employ shift-based communication patterns for this reason. This includes large message collective algorithms, such as those typically used to implement large message Allgather and Alltoall collectives. It also includes small message collective algorithms, such as pair-wise exchange, barrier dissemination, and Bruck's index algorithm. Although the small message algorithms typically pack messages such that each node does not send to or receive from every other node directly, the communication patterns they do execute correspond to particular steps in shift-based communication. Common point-to-point message patterns also can benefit from efficient shift-based routing, such as nearest-neighbor exchanges in domain decomposition codes. Supporting efficient shift-based communication within a job thus provides good performance for a number of common MPI operations.
- Research Organization:
- Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- W-7405-ENG-48
- OSTI ID:
- 967277
- Report Number(s):
- LLNL-TR-418522
- Country of Publication:
- United States
- Language:
- English
Similar Records
SLOAVx: Scalable LOgarithmic AlltoallV Algorithm for Hierarchical Multicore Systems
Scalable Algorithms for MPI Intergroup Allgather and Allgatherv
Assessing the Performance and Scalability of a Novel Multilevel K-Nomial Allgather on CORE-Direct Systems
Conference
·
Tue Jun 25 00:00:00 EDT 2013
· PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013)
·
OSTI ID:1567334
Scalable Algorithms for MPI Intergroup Allgather and Allgatherv
Journal Article
·
Mon Apr 29 20:00:00 EDT 2019
· Parallel Computing
·
OSTI ID:1577476
Assessing the Performance and Scalability of a Novel Multilevel K-Nomial Allgather on CORE-Direct Systems
Conference
·
Sat Dec 31 23:00:00 EST 2011
·
OSTI ID:1049195