skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM

Abstract

NVSHMEM is an implementation of OpenSHMEM for NVIDIA GPUs which allows communication to be issued from inside CUDA kernels. In this work, we present an implementation of Breadth First Search for multi-GPU systems using NVSHMEM. We analyze the benefits and bottlenecks of moving fine-grained communication into CUDA kernels. Using our implementation of BFS, we achieve up to 75% improvement in performance compared to a CUDA-aware MPI-based implementation, in the best case.

Authors:
 [1];  [1];  [2];  [2]
  1. NVIDIA Corporation, Santa Clara, CA (United States)
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1567474
Resource Type:
Conference
Resource Relation:
Conference: OpenSHMEM 2017: OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Computer Science

Citation Formats

Potluri, Sreeram, Goswami, Anshuman, Venkata, Manjunath Gorentla, and Imam, Neena. Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM. United States: N. p., 2017. Web. doi:10.1007/978-3-319-73814-7_6.
Potluri, Sreeram, Goswami, Anshuman, Venkata, Manjunath Gorentla, & Imam, Neena. Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM. United States. doi:10.1007/978-3-319-73814-7_6.
Potluri, Sreeram, Goswami, Anshuman, Venkata, Manjunath Gorentla, and Imam, Neena. Mon . "Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM". United States. doi:10.1007/978-3-319-73814-7_6.
@article{osti_1567474,
title = {Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM},
author = {Potluri, Sreeram and Goswami, Anshuman and Venkata, Manjunath Gorentla and Imam, Neena},
abstractNote = {NVSHMEM is an implementation of OpenSHMEM for NVIDIA GPUs which allows communication to be issued from inside CUDA kernels. In this work, we present an implementation of Breadth First Search for multi-GPU systems using NVSHMEM. We analyze the benefits and bottlenecks of moving fine-grained communication into CUDA kernels. Using our implementation of BFS, we achieve up to 75% improvement in performance compared to a CUDA-aware MPI-based implementation, in the best case.},
doi = {10.1007/978-3-319-73814-7_6},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2017},
month = {8}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:

Works referenced in this record:

Scalable GPU graph traversal
journal, September 2012

  • Merrill, Duane; Garland, Michael; Grimshaw, Andrew
  • ACM SIGPLAN Notices, Vol. 47, Issue 8
  • DOI: 10.1145/2370036.2145832

Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs
conference, October 2013

  • Potluri, Sreeram; Hamidouche, Khaled; Venkatesh, Akshay
  • 2013 42nd International Conference on Parallel Processing (ICPP)
  • DOI: 10.1109/ICPP.2013.17

MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems
conference, June 2012

  • Aji, Ashwin M.; Dinan, James; Buntinas, Darius
  • 2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS), 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems
  • DOI: 10.1109/HPCC.2012.92

Extending OpenSHMEM for GPU Computing
conference, May 2013

  • Potluri, S.; Bureddy, D.; Wang, H.
  • 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
  • DOI: 10.1109/IPDPS.2013.104

GPU programming in a high level language: compiling X10 to CUDA
conference, January 2011

  • Cunningham, Dave; Bordawekar, Rajesh; Saraswat, Vijay
  • Proceedings of the 2011 ACM SIGPLAN X10 Workshop on - X10 '11
  • DOI: 10.1145/2212736.2212744

FLAT: a GPU programming framework to provide embedded MPI
conference, January 2012

  • Miyoshi, Takefumi; Irie, Hidetsugu; Shima, Keigo
  • Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units - GPGPU-5
  • DOI: 10.1145/2159430.2159433

Parallel distributed breadth first search on GPU
conference, December 2013

  • Ueno, Koji; Suzumura, Toyotaro
  • 2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing
  • DOI: 10.1109/HiPC.2013.6799136

Making TSUBAME2.0, the world's greenest production supercomputer, even greener — Challenges to the architects
conference, August 2011

  • Matsuoka, Satoshi
  • 2011 International Symposium on Low Power Electronics and Design (ISLPED), IEEE/ACM International Symposium on Low Power Electronics and Design
  • DOI: 10.1109/ISLPED.2011.5993666

Parallel Distributed Breadth First Search on the Kepler Architecture
journal, July 2016

  • Bisson, Mauro; Bernaschi, Massimo; Mastrostefano, Enrico
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 27, Issue 7
  • DOI: 10.1109/TPDS.2015.2475270