Manage OpenMP GPU Data Environment Under Unified Address Space
Abstract
OpenMP has supported the offload of computations to accelerators such as GPUs since version 4.0. A crucial aspect in OpenMP offloading is to manage the accelerator data environment. Currently, this has to be explicitly programmed by users, which is non-trival and often results in suboptimal performance. The unified memory feature available in recent GPU architectures introduces another option, implicit management. However, our experiments show that it incurs several performance issues, especially under GPU memory oversubscription. In this paper, we propose a compiler and runtime collaborative approach to manage OpenMP GPU data under unified memory. In our framework, the compiler performs data reuse analysis to assist runtime data management. The runtime combines static and dynamic information to make optimized data management decisions.We have implement the proposed technology in the LLVM framework. The evaluation shows our method can achieve significant performance improvement for OpenMP GPU offloading.
- Authors:
- Publication Date:
- Research Org.:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Brookhaven National Lab. (BNL), Upton, NY (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (SC-21)
- OSTI Identifier:
- 1484438
- Report Number(s):
- BNL-209639-2018-COPA
Journal ID: ISSN 0302-9743
- DOE Contract Number:
- SC0012704
- Resource Type:
- Conference
- Resource Relation:
- Journal Volume: 11128; Conference: International Workshop on OpenMP 2018, Barcelona, Spain, 9/26/2018 - 9/28/2018
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Data management; Unified memory; OpenMP offloading; Compiler; Runtime; LLVM
Citation Formats
Li, Lingda. Manage OpenMP GPU Data Environment Under Unified Address Space. United States: N. p., 2018.
Web. doi:10.1007/978-3-319-98521-3_5.
Li, Lingda. Manage OpenMP GPU Data Environment Under Unified Address Space. United States. https://doi.org/10.1007/978-3-319-98521-3_5
Li, Lingda. 2018.
"Manage OpenMP GPU Data Environment Under Unified Address Space". United States. https://doi.org/10.1007/978-3-319-98521-3_5. https://www.osti.gov/servlets/purl/1484438.
@article{osti_1484438,
title = {Manage OpenMP GPU Data Environment Under Unified Address Space},
author = {Li, Lingda},
abstractNote = {OpenMP has supported the offload of computations to accelerators such as GPUs since version 4.0. A crucial aspect in OpenMP offloading is to manage the accelerator data environment. Currently, this has to be explicitly programmed by users, which is non-trival and often results in suboptimal performance. The unified memory feature available in recent GPU architectures introduces another option, implicit management. However, our experiments show that it incurs several performance issues, especially under GPU memory oversubscription. In this paper, we propose a compiler and runtime collaborative approach to manage OpenMP GPU data under unified memory. In our framework, the compiler performs data reuse analysis to assist runtime data management. The runtime combines static and dynamic information to make optimized data management decisions.We have implement the proposed technology in the LLVM framework. The evaluation shows our method can achieve significant performance improvement for OpenMP GPU offloading.},
doi = {10.1007/978-3-319-98521-3_5},
url = {https://www.osti.gov/biblio/1484438},
journal = {},
issn = {0302-9743},
number = ,
volume = 11128,
place = {United States},
year = {2018},
month = {9}
}
Works referenced in this record:
Page Placement Strategies for GPUs within Heterogeneous Memory Systems
conference, January 2015
- Agarwal, Neha; Nellans, David; Stephenson, Mark
- Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '15
Offloading Support for OpenMP in Clang and LLVM
conference, November 2016
- Antao, Samuel F.; Bataev, Alexey; Jacob, Arpith C.
- 2016 Third Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)
Rodinia: A benchmark suite for heterogeneous computing
conference, October 2009
- Che, Shuai; Boyer, Michael; Meng, Jiayuan
- 2009 IEEE International Symposium on Workload Characterization (IISWC)
Directive-Based Partitioning and Pipelining for Graphics Processing Units
conference, May 2017
- Cui, Xuewen; Scogland, Thomas R. W.; Supinski, Bronis R. de
- 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Automatic CPU-GPU communication management and optimization
conference, January 2011
- Jablin, Thomas B.; Prabhu, Prakash; Jablin, James A.
- Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation - PLDI '11
High performance cache replacement using re-reference interval prediction (RRIP)
conference, January 2010
- Jaleel, Aamer; Theobald, Kevin B.; Steely, Simon C.
- Proceedings of the 37th annual international symposium on Computer architecture - ISCA '10
LLVM: A compilation framework for lifelong program analysis & transformation
conference, January 2004
- Lattner, C.; Adve, V.
- International Symposium on Code Generation and Optimization, 2004. CGO 2004.
Optimal bypass monitor for high performance last-level caches
conference, January 2012
- Li, Lingda; Tong, Dong; Xie, Zichao
- Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12
Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading
conference, January 2017
- Mishra, Alok; Li, Lingda; Kong, Martin
- Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC - LLVM-HPC'17
Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme
conference, January 2012
- Pai, Sreepathi; Govindarajan, R.; Thazhuthaveetil, Matthew J.
- Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12
Adaptive insertion policies for high performance caching
conference, January 2007
- Qureshi, Moinuddin K.; Jaleel, Aamer; Patt, Yale N.
- Proceedings of the 34th annual international symposium on Computer architecture - ISCA '07
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems
journal, May 2010
- Stone, John E.; Gohara, David; Shi, Guochun
- Computing in Science & Engineering, Vol. 12, Issue 3, p. 66-73
Optimizing bandwidth and power of graphics memory with hybrid memory technologies and adaptive data migration
conference, January 2012
- Zhao, Jishen; Xie, Yuan
- Proceedings of the International Conference on Computer-Aided Design - ICCAD '12