Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Contention-free Routing for Shift-based Communication in MPI Applications on Large-scale Infiniband Clusters

Technical Report ·
DOI:https://doi.org/10.2172/967277· OSTI ID:967277
Shift-based communication can be defined as follows. For a set of N nodes assigned to a job, assign each node an ID from 0 to N - 1. Shift-based communication involves N - 1 steps. Let D denote the current step, and let D iterate from 1 to N - 1. Then in step D all nodes choose nodes to send to and receive from such that a node with ID = I sends to the node with ID = (I + D)%N and receives from the node with ID = (I - D + N)%N, where % denotes modulo division. Figure 1 illustrates the communication patterns for various steps in shift-based communication. Shift-based communication patterns enable all nodes in a job to send and receive data simultaneously in all steps. Many MPI operations employ shift-based communication patterns for this reason. This includes large message collective algorithms, such as those typically used to implement large message Allgather and Alltoall collectives. It also includes small message collective algorithms, such as pair-wise exchange, barrier dissemination, and Bruck's index algorithm. Although the small message algorithms typically pack messages such that each node does not send to or receive from every other node directly, the communication patterns they do execute correspond to particular steps in shift-based communication. Common point-to-point message patterns also can benefit from efficient shift-based routing, such as nearest-neighbor exchanges in domain decomposition codes. Supporting efficient shift-based communication within a job thus provides good performance for a number of common MPI operations.
Research Organization:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
W-7405-ENG-48
OSTI ID:
967277
Report Number(s):
LLNL-TR-418522
Country of Publication:
United States
Language:
English

Similar Records

SLOAVx: Scalable LOgarithmic AlltoallV Algorithm for Hierarchical Multicore Systems
Conference · Tue Jun 25 00:00:00 EDT 2013 · PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013) · OSTI ID:1567334

Scalable Algorithms for MPI Intergroup Allgather and Allgatherv
Journal Article · Mon Apr 29 20:00:00 EDT 2019 · Parallel Computing · OSTI ID:1577476

Assessing the Performance and Scalability of a Novel Multilevel K-Nomial Allgather on CORE-Direct Systems
Conference · Sat Dec 31 23:00:00 EST 2011 · OSTI ID:1049195