SLOAVx: Scalable LOgarithmic AlltoallV Algorithm for Hierarchical Multicore Systems

Xu, Cong; Venkata, M. G.; Graham, R. L.; Wang, Yandong; Liu, Zhuo; Yu, Weikuan

doi:10.1109/CCGrid.2013.22

Title: SLOAVx: Scalable LOgarithmic AlltoallV Algorithm for Hierarchical Multicore Systems

Conference · Tue Jun 25 00:00:00 EDT 2013 · PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013)

DOI:https://doi.org/10.1109/CCGrid.2013.22· OSTI ID:1567334

Xu, Cong; Venkata, M. G.; Graham, R. L.; Wang, Yandong; Liu, Zhuo; Yu, Weikuan

Scientific applications use collective communication operations in Message Passing Interface (MPI) for global synchronization and data exchanges. Alltoall and AlltoallV are two important collective operations. They are used by MPI jobs to exchange messages among all of MPI processes. AlltoallV is a generalization of Alltoall, supporting messages of varying sizes. However, the existing MPI AlltoallV implementation has linear complexity, i.e., each process has to send messages to all other processes in the job. Such linear complexity can result in sub optimal scalability of MPI applications when they are deployed on millions of cores. To address above challenge, in this paper, we introduce a new Scalable LOgarithmic AlltoallV algorithm, named SLOAV, for MPI AlltoallV collective operation. SLOAV aims to achieve global exchange of small messages of different sizes in a logarithmic number of rounds. Furthermore, given the prevalence of multicore systems with shared memory, we design a hierarchical AlltoallV algorithm based on SLOAV by leveraging the advantages of shared memory, which is referred to as SLOAVx. Compared to SLOAV, SLOAVx significantly reduces the inter-node communication, thus improving the entire system performance and mitigating the impact of message latency. We have implemented and embedded both algorithms in Open MPI. Our evaluation on large-scale computer systems shows that for the 8-byte and 1024-process MPI Alltoallv operation, the SLOAV can reduce the latency by as much as 86.4%, when compared to the state-of-the-art, and SLOAVx can further optimize the SLOAV by up to 83.1% in terms of message latency on multicore systems. In addition, experiments with NAS Parallel Benchmark (NPB) demonstrate that our algorithms are very effective for real-world applications.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)

Sponsoring Organization:: USDOE Office of Science (SC)

OSTI ID:: 1567334

Journal Information:: PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013), Conference: PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013), May 13-16, 2013, Delft, Netherlands

Country of Publication:: United States

Language:: English

Similar Records

Optimizing Blocking and Nonblocking Reduction Operations for Multicore Systems: Hierarchical Design and Implementation

Conference · Tue Jan 01 00:00:00 EST 2013 · OSTI ID:1567334

Gorentla Venkata, Manjunath; Shamis, Pavel; Graham, Richard L; +2 more

Optimizing blocking and nonblocking reduction operations for multicore systems: Hierarchical design and implementation

Conference · Sun Sep 01 00:00:00 EDT 2013 · 2013 IEEE International Conference on Cluster Computing (CLUSTER) · OSTI ID:1567334

Venkata, Manjunath Gorentla; Shamis, Pavel; Sampath, Rahul; +2 more

Contention-free Routing for Shift-based Communication in MPI Applications on Large-scale Infiniband Clusters

Technical Report · Tue Oct 20 00:00:00 EDT 2009 · OSTI ID:1567334

Moody, A.

Related Subjects

Computer Science
Engineering

Title: SLOAVx: Scalable LOgarithmic AlltoallV Algorithm for Hierarchical Multicore Systems

Citation Formats

Similar Records

Related Subjects