Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Manage OpenMP GPU Data Environment Under Unified Address Space

Conference ·
OpenMP has supported the offload of computations to accelerators such as GPUs since version 4.0. A crucial aspect in OpenMP offloading is to manage the accelerator data environment. Currently, this has to be explicitly programmed by users, which is non-trival and often results in suboptimal performance. The unified memory feature available in recent GPU architectures introduces another option, implicit management. However, our experiments show that it incurs several performance issues, especially under GPU memory oversubscription. In this paper, we propose a compiler and runtime collaborative approach to manage OpenMP GPU data under unified memory. In our framework, the compiler performs data reuse analysis to assist runtime data management. The runtime combines static and dynamic information to make optimized data management decisions.We have implement the proposed technology in the LLVM framework. The evaluation shows our method can achieve significant performance improvement for OpenMP GPU offloading.
Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Brookhaven National Laboratory (BNL), Upton, NY (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (SC-21)
DOE Contract Number:
SC0012704
OSTI ID:
1484438
Report Number(s):
BNL-209639-2018-COPA
Country of Publication:
United States
Language:
English

References (16)

Optimal bypass monitor for high performance last-level caches conference January 2012
Optimizing bandwidth and power of graphics memory with hybrid memory technologies and adaptive data migration conference January 2012
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems journal May 2010
Double Buffering for MCDRAM on Second Generation $\hbox {Intel}^{\circledR }$ Xeon Phi $^{\text {TM}}$ Processors with OpenMP book January 2017
Offloading Support for OpenMP in Clang and LLVM conference November 2016
Adaptive insertion policies for high performance caching conference January 2007
Hands on with OpenMP4.5 and Unified Memory: Developing Applications for IBM’s Hybrid CPU + GPU Systems (Part I) book January 2017
Page Placement Strategies for GPUs within Heterogeneous Memory Systems
  • Agarwal, Neha; Nellans, David; Stephenson, Mark
  • Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '15 https://doi.org/10.1145/2694344.2694381
conference January 2015
Directive-Based Partitioning and Pipelining for Graphics Processing Units conference May 2017
Automatic CPU-GPU communication management and optimization
  • Jablin, Thomas B.; Prabhu, Prakash; Jablin, James A.
  • Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation - PLDI '11 https://doi.org/10.1145/1993498.1993516
conference January 2011
Rodinia: A benchmark suite for heterogeneous computing conference October 2009
LLVM: A compilation framework for lifelong program analysis & transformation conference January 2004
High performance cache replacement using re-reference interval prediction (RRIP) conference January 2010
Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme
  • Pai, Sreepathi; Govindarajan, R.; Thazhuthaveetil, Matthew J.
  • Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12 https://doi.org/10.1145/2370816.2370824
conference January 2012
Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading conference January 2017
A Pattern for Overlapping Communication and Computation with OpenMP $^*$ Target Directives book January 2017

Similar Records

Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading
Conference · Sat Dec 31 23:00:00 EST 2016 · OSTI ID:1412779

Experimental Characterization of OpenMP Offloading Memory Operations and Unified Shared Memory Support
Conference · Fri Sep 01 00:00:00 EDT 2023 · OSTI ID:2000362

MemHC: An Optimized GPU Memory Management Framework for Accelerating Many-body Correlation
Journal Article · Wed Mar 23 20:00:00 EDT 2022 · ACM Transactions on Architecture and Code Optimization · OSTI ID:1867362