Manage OpenMP GPU Data Environment Under Unified Address Space

Li, Lingda

doi:10.1007/978-3-319-98521-3_5

Manage OpenMP GPU Data Environment Under Unified Address Space

Conference · Wed Sep 26 04:00:00 EDT 2018

DOI:https://doi.org/10.1007/978-3-319-98521-3_5· OSTI ID:1484438

Li, Lingda

OpenMP has supported the offload of computations to accelerators such as GPUs since version 4.0. A crucial aspect in OpenMP offloading is to manage the accelerator data environment. Currently, this has to be explicitly programmed by users, which is non-trival and often results in suboptimal performance. The unified memory feature available in recent GPU architectures introduces another option, implicit management. However, our experiments show that it incurs several performance issues, especially under GPU memory oversubscription. In this paper, we propose a compiler and runtime collaborative approach to manage OpenMP GPU data under unified memory. In our framework, the compiler performs data reuse analysis to assist runtime data management. The runtime combines static and dynamic information to make optimized data management decisions.We have implement the proposed technology in the LLVM framework. The evaluation shows our method can achieve significant performance improvement for OpenMP GPU offloading.

View Conference

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Brookhaven National Laboratory (BNL), Upton, NY (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Advanced Scientific Computing Research (SC-21)

DOE Contract Number:: SC0012704

OSTI ID:: 1484438

Report Number(s):: BNL-209639-2018-COPA

Country of Publication:: United States

Language:: English

References (16)

Optimal bypass monitor for high performance last-level caches Li, Lingda; Tong, Dong; Xie, Zichao Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12 https://doi.org/10.1145/2370816.2370862	conference	January 2012
Optimizing bandwidth and power of graphics memory with hybrid memory technologies and adaptive data migration Zhao, Jishen; Xie, Yuan Proceedings of the International Conference on Computer-Aided Design - ICCAD '12 https://doi.org/10.1145/2429384.2429400	conference	January 2012
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems Stone, John E.; Gohara, David; Shi, Guochun Computing in Science & Engineering, Vol. 12, Issue 3, p. 66-73 https://doi.org/10.1109/MCSE.2010.69	journal	May 2010
Double Buffering for MCDRAM on Second Generation $\hbox {Intel}^{\circledR }$ Xeon Phi $^{\text {TM}}$ Processors with OpenMP Olivier, Stephen L.; Hammond, Simon D.; Duran, Alejandro Scaling OpenMP for Exascale Performance and Portability https://doi.org/10.1007/978-3-319-65578-9_21	book	January 2017
Offloading Support for OpenMP in Clang and LLVM Antao, Samuel F.; Bataev, Alexey; Jacob, Arpith C. 2016 Third Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC) https://doi.org/10.1109/LLVM-HPC.2016.006	conference	November 2016
Adaptive insertion policies for high performance caching Qureshi, Moinuddin K.; Jaleel, Aamer; Patt, Yale N. Proceedings of the 34th annual international symposium on Computer architecture - ISCA '07 https://doi.org/10.1145/1250662.1250709	conference	January 2007
Hands on with OpenMP4.5 and Unified Memory: Developing Applications for IBM’s Hybrid CPU + GPU Systems (Part I) Grinberg, Leopold; Bertolli, Carlo; Haque, Riyaz Scaling OpenMP for Exascale Performance and Portability https://doi.org/10.1007/978-3-319-65578-9_1	book	January 2017
Page Placement Strategies for GPUs within Heterogeneous Memory Systems Agarwal, Neha; Nellans, David; Stephenson, Mark Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '15 https://doi.org/10.1145/2694344.2694381	conference	January 2015
Directive-Based Partitioning and Pipelining for Graphics Processing Units Cui, Xuewen; Scogland, Thomas R. W.; Supinski, Bronis R. de 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2017.96	conference	May 2017
Automatic CPU-GPU communication management and optimization Jablin, Thomas B.; Prabhu, Prakash; Jablin, James A. Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation - PLDI '11 https://doi.org/10.1145/1993498.1993516	conference	January 2011
Rodinia: A benchmark suite for heterogeneous computing Che, Shuai; Boyer, Michael; Meng, Jiayuan 2009 IEEE International Symposium on Workload Characterization (IISWC) https://doi.org/10.1109/IISWC.2009.5306797	conference	October 2009
LLVM: A compilation framework for lifelong program analysis & transformation Lattner, C.; Adve, V. International Symposium on Code Generation and Optimization, 2004. CGO 2004. https://doi.org/10.1109/CGO.2004.1281665	conference	January 2004
High performance cache replacement using re-reference interval prediction (RRIP) Jaleel, Aamer; Theobald, Kevin B.; Steely, Simon C. Proceedings of the 37th annual international symposium on Computer architecture - ISCA '10 https://doi.org/10.1145/1815961.1815971	conference	January 2010
Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme Pai, Sreepathi; Govindarajan, R.; Thazhuthaveetil, Matthew J. Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12 https://doi.org/10.1145/2370816.2370824	conference	January 2012
Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading Mishra, Alok; Li, Lingda; Kong, Martin Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC - LLVM-HPC'17 https://doi.org/10.1145/3148173.3148184	conference	January 2017
A Pattern for Overlapping Communication and Computation with OpenMP $^*$ Target Directives Hahnfeld, Jonas; Cramer, Tim; Klemm, Michael Scaling OpenMP for Exascale Performance and Portability https://doi.org/10.1007/978-3-319-65578-9_22	book	January 2017

Similar Records

Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading

Conference · Sat Dec 31 23:00:00 EST 2016 · OSTI ID:1412779

Experimental Characterization of OpenMP Offloading Memory Operations and Unified Shared Memory Support

Conference · Fri Sep 01 00:00:00 EDT 2023 · OSTI ID:2000362

MemHC: An Optimized GPU Memory Management Framework for Accelerating Many-body Correlation

Journal Article · Wed Mar 23 20:00:00 EDT 2022 · ACM Transactions on Architecture and Code Optimization · OSTI ID:1867362

Related Subjects

97 MATHEMATICS AND COMPUTING
Compiler
Data management
LLVM
OpenMP offloading
Runtime
Unified memory

Manage OpenMP GPU Data Environment Under Unified Address Space

Citation Formats

References (16)

Similar Records

Related Subjects