Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading

Conference ·
 [1];  [2];  [2];  [3];  [4]
  1. Stony Brook Univ., Stony Brook, NY (United States)
  2. Brookhaven National Lab. (BNL), Upton, NY (United States)
  3. Argonne National Lab. (ANL), Argonne, IL (United States)
  4. Stony Brook Univ., Stony Brook, NY (United States); Brookhaven National Lab. (BNL), Upton, NY (United States)
Here, the latest OpenMP standard offers automatic device offloading capabilities which facilitate GPU programming. Despite this, there remain many challenges. One of these is the unified memory feature introduced in recent GPUs. GPUs in current and future HPC systems have enhanced support for unified memory space. In such systems, CPU and GPU can access each other's memory transparently, that is, the data movement is managed automatically by the underlying system software and hardware. Memory over subscription is also possible in these systems. However, there is a significant lack of knowledge about how this mechanism will perform, and how programmers should use it. We have modified several benchmarks codes, in the Rodinia benchmark suite, to study the behavior of OpenMP accelerator extensions and have used them to explore the impact of unified memory in an OpenMP context. We moreover modified the open source LLVM compiler to allow OpenMP programs to exploit unified memory. The results of our evaluation reveal that, while the performance of unified memory is comparable with that of normal GPU offloading for benchmarks with little data reuse, it suffers from significant overhead when GPU memory is over subcribed for benchmarks with large amount of data reuse. Based on these results, we provide several guidelines for programmers to achieve better performance with unified memory.
Research Organization:
Brookhaven National Laboratory (BNL), Upton, NY (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (SC-21)
DOE Contract Number:
SC0012704
OSTI ID:
1412779
Report Number(s):
BNL--114801-2017-JA
Country of Publication:
United States
Language:
English

References (10)

Optimal bypass monitor for high performance last-level caches conference January 2012
Offloading Support for OpenMP in Clang and LLVM conference November 2016
Adaptive insertion policies for high performance caching conference January 2007
Automatic CPU-GPU communication management and optimization
  • Jablin, Thomas B.; Prabhu, Prakash; Jablin, James A.
  • Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation - PLDI '11 https://doi.org/10.1145/1993498.1993516
conference January 2011
The Scalable Heterogeneous Computing (SHOC) benchmark suite
  • Danalis, Anthony; Marin, Gabriel; McCurdy, Collin
  • Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units - GPGPU '10 https://doi.org/10.1145/1735688.1735702
conference January 2010
Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems conference January 2013
LLVM: A compilation framework for lifelong program analysis & transformation conference January 2004
High performance cache replacement using re-reference interval prediction (RRIP) conference January 2010
OpenMP: an industry standard API for shared-memory programming journal January 1998
Auto-tuning a high-level language targeted to GPU codes conference May 2012