skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM

Conference ·
 [1];  [1];  [2];  [2]
  1. NVIDIA Corporation, Santa Clara, CA (United States)
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

NVSHMEM is an implementation of OpenSHMEM for NVIDIA GPUs which allows communication to be issued from inside CUDA kernels. In this work, we present an implementation of Breadth First Search for multi-GPU systems using NVSHMEM. We analyze the benefits and bottlenecks of moving fine-grained communication into CUDA kernels. Using our implementation of BFS, we achieve up to 75% improvement in performance compared to a CUDA-aware MPI-based implementation, in the best case.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Organization:
USDOE Office of Science (SC)
OSTI ID:
1567474
Resource Relation:
Conference: OpenSHMEM 2017: OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence
Country of Publication:
United States
Language:
English

References (9)

Scalable GPU graph traversal journal September 2012
Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs conference October 2013
MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems
  • Aji, Ashwin M.; Dinan, James; Buntinas, Darius
  • 2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS), 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems https://doi.org/10.1109/HPCC.2012.92
conference June 2012
Extending OpenSHMEM for GPU Computing
  • Potluri, S.; Bureddy, D.; Wang, H.
  • 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2013.104
conference May 2013
GPU programming in a high level language: compiling X10 to CUDA conference January 2011
FLAT: a GPU programming framework to provide embedded MPI
  • Miyoshi, Takefumi; Irie, Hidetsugu; Shima, Keigo
  • Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units - GPGPU-5 https://doi.org/10.1145/2159430.2159433
conference January 2012
Parallel distributed breadth first search on GPU
  • Ueno, Koji; Suzumura, Toyotaro
  • 2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing https://doi.org/10.1109/HiPC.2013.6799136
conference December 2013
Making TSUBAME2.0, the world's greenest production supercomputer, even greener — Challenges to the architects conference August 2011
Parallel Distributed Breadth First Search on the Kepler Architecture journal July 2016