Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM

Potluri, Sreeram; Goswami, Anshuman; Venkata, Manjunath Gorentla; Imam, Neena

doi:10.1007/978-3-319-73814-7_6

Title: Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM

Conference · Mon Aug 07 00:00:00 EDT 2017

DOI:https://doi.org/10.1007/978-3-319-73814-7_6· OSTI ID:1567474

Potluri, Sreeram ^[1]; Goswami, Anshuman ^[1]; Venkata, Manjunath Gorentla ^[2]; Imam, Neena ^[2]

NVIDIA Corporation, Santa Clara, CA (United States)
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

NVSHMEM is an implementation of OpenSHMEM for NVIDIA GPUs which allows communication to be issued from inside CUDA kernels. In this work, we present an implementation of Breadth First Search for multi-GPU systems using NVSHMEM. We analyze the benefits and bottlenecks of moving fine-grained communication into CUDA kernels. Using our implementation of BFS, we achieve up to 75% improvement in performance compared to a CUDA-aware MPI-based implementation, in the best case.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)

Sponsoring Organization:: USDOE Office of Science (SC)

OSTI ID:: 1567474

Resource Relation:: Conference: OpenSHMEM 2017: OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence

Country of Publication:: United States

Language:: English

References (9)

Scalable GPU graph traversal Merrill, Duane; Garland, Michael; Grimshaw, Andrew ACM SIGPLAN Notices, Vol. 47, Issue 8 https://doi.org/10.1145/2370036.2145832	journal	September 2012
Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs Potluri, Sreeram; Hamidouche, Khaled; Venkatesh, Akshay 2013 42nd International Conference on Parallel Processing (ICPP) https://doi.org/10.1109/ICPP.2013.17	conference	October 2013
MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems Aji, Ashwin M.; Dinan, James; Buntinas, Darius 2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS), 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems https://doi.org/10.1109/HPCC.2012.92	conference	June 2012
Extending OpenSHMEM for GPU Computing Potluri, S.; Bureddy, D.; Wang, H. 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2013.104	conference	May 2013
GPU programming in a high level language: compiling X10 to CUDA Cunningham, Dave; Bordawekar, Rajesh; Saraswat, Vijay Proceedings of the 2011 ACM SIGPLAN X10 Workshop on - X10 '11 https://doi.org/10.1145/2212736.2212744	conference	January 2011
FLAT: a GPU programming framework to provide embedded MPI Miyoshi, Takefumi; Irie, Hidetsugu; Shima, Keigo Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units - GPGPU-5 https://doi.org/10.1145/2159430.2159433	conference	January 2012
Parallel distributed breadth first search on GPU Ueno, Koji; Suzumura, Toyotaro 2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing https://doi.org/10.1109/HiPC.2013.6799136	conference	December 2013
Making TSUBAME2.0, the world's greenest production supercomputer, even greener — Challenges to the architects Matsuoka, Satoshi 2011 International Symposium on Low Power Electronics and Design (ISLPED), IEEE/ACM International Symposium on Low Power Electronics and Design https://doi.org/10.1109/ISLPED.2011.5993666	conference	August 2011
Parallel Distributed Breadth First Search on the Kepler Architecture Bisson, Mauro; Bernaschi, Massimo; Mastrostefano, Enrico IEEE Transactions on Parallel and Distributed Systems, Vol. 27, Issue 7 https://doi.org/10.1109/TPDS.2015.2475270	journal	July 2016

Similar Records

GPU-Centric Communication on NVIDIA GPU Clusters with InfiniBand: A Case Study with OpenSHMEM

Conference · Fri Dec 01 00:00:00 EST 2017 · OSTI ID:1567474

Potluri, Sreeram; Goswami, Anshuman; Rossetti, Davide; +3 more

Using Hybrid Model OpenSHMEM + CUDA to Implement the SHOC Benchmark Suite

Conference · Thu Aug 04 00:00:00 EDT 2016 · OSTI ID:1567474

Grodowitz, Megan; D’Azevedo, Eduardo; Powers, Sarah; +1 more

Optimizing the hypre solver for manycore and GPU architectures

Journal Article · Thu Dec 24 00:00:00 EST 2020 · Journal of Computational Science · OSTI ID:1567474

Sahasrabudhe, Damodar; Zambre, Rohit; Chandramowlishwaran, Aparna; +1 more

Related Subjects

97 MATHEMATICS AND COMPUTING
Computer Science

Title: Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM

Citation Formats

References (9)

Similar Records

Related Subjects