Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM

Potluri, Sreeram; Goswami, Anshuman; Venkata, Manjunath Gorentla; Imam, Neena

doi:10.1007/978-3-319-73814-7_6

Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM

Conference · Mon Aug 07 00:00:00 EDT 2017

DOI:https://doi.org/10.1007/978-3-319-73814-7_6· OSTI ID:1567474

Potluri, Sreeram ^[1]; Goswami, Anshuman ^[1]; Venkata, Manjunath Gorentla ^[2]; Imam, Neena ^[2]

NVIDIA Corporation, Santa Clara, CA (United States)
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

NVSHMEM is an implementation of OpenSHMEM for NVIDIA GPUs which allows communication to be issued from inside CUDA kernels. In this work, we present an implementation of Breadth First Search for multi-GPU systems using NVSHMEM. We analyze the benefits and bottlenecks of moving fine-grained communication into CUDA kernels. Using our implementation of BFS, we achieve up to 75% improvement in performance compared to a CUDA-aware MPI-based implementation, in the best case.

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)

Sponsoring Organization:: USDOE Office of Science (SC)

OSTI ID:: 1567474

Country of Publication:: United States

Language:: English

References (10)

Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs Potluri, Sreeram; Hamidouche, Khaled; Venkatesh, Akshay 2013 42nd International Conference on Parallel Processing (ICPP) https://doi.org/10.1109/ICPP.2013.17	conference	October 2013
Exploring OpenSHMEM Model to Program GPU-based Extreme-Scale Systems Potluri, Sreeram; Rossetti, Davide; Becker, Donald Lecture Notes in Computer Science https://doi.org/10.1007/978-3-319-26428-8_2	book	January 2015
GPU programming in a high level language: compiling X10 to CUDA Cunningham, Dave; Bordawekar, Rajesh; Saraswat, Vijay Proceedings of the 2011 ACM SIGPLAN X10 Workshop on - X10 '11 https://doi.org/10.1145/2212736.2212744	conference	January 2011
Parallel distributed breadth first search on GPU Ueno, Koji; Suzumura, Toyotaro 2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing https://doi.org/10.1109/HiPC.2013.6799136	conference	December 2013
Making TSUBAME2.0, the world's greenest production supercomputer, even greener — Challenges to the architects Matsuoka, Satoshi 2011 International Symposium on Low Power Electronics and Design (ISLPED), IEEE/ACM International Symposium on Low Power Electronics and Design https://doi.org/10.1109/ISLPED.2011.5993666	conference	August 2011
MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems Aji, Ashwin M.; Dinan, James; Buntinas, Darius 2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS), 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems https://doi.org/10.1109/HPCC.2012.92	conference	June 2012
Extending OpenSHMEM for GPU Computing Potluri, S.; Bureddy, D.; Wang, H. 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2013.104	conference	May 2013
Parallel Distributed Breadth First Search on the Kepler Architecture Bisson, Mauro; Bernaschi, Massimo; Mastrostefano, Enrico IEEE Transactions on Parallel and Distributed Systems, Vol. 27, Issue 7 https://doi.org/10.1109/TPDS.2015.2475270	journal	July 2016
Scalable GPU graph traversal Merrill, Duane; Garland, Michael; Grimshaw, Andrew ACM SIGPLAN Notices, Vol. 47, Issue 8 https://doi.org/10.1145/2370036.2145832	journal	September 2012
FLAT: a GPU programming framework to provide embedded MPI Miyoshi, Takefumi; Irie, Hidetsugu; Shima, Keigo Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units - GPGPU-5 https://doi.org/10.1145/2159430.2159433	conference	January 2012

Similar Records

GPU-Centric Communication on NVIDIA GPU Clusters with InfiniBand: A Case Study with OpenSHMEM

Conference · Thu Nov 30 23:00:00 EST 2017 · OSTI ID:1427708

Using Hybrid Model OpenSHMEM + CUDA to Implement the SHOC Benchmark Suite

Conference · Thu Aug 04 00:00:00 EDT 2016 · OSTI ID:1567410

Establish the basis for Breadth-First Search on Frontier System: XBFS on AMD GPUs

Conference · Fri Nov 01 00:00:00 EDT 2024 · OSTI ID:2584525

Related Subjects

97 MATHEMATICS AND COMPUTING
Computer Science

Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM

Citation Formats

References (10)

Similar Records

Related Subjects