Heterogeneous computing with OpenMP and Hydra
- University of Illinois at Urbana‐Champaign Champaign Illinois USA
Summary High‐performance computing relies on accelerators (such as GPGPUs) to achieve fast execution of scientific applications. Traditionally, these accelerators have been programmed with specialized languages, such as CUDA or OpenCL. In recent years, OpenMP emerged as a promising alternative for supporting accelerators, providing advantages such as maintaining a single code base for the host and different accelerator types and providing a simple way to extend support for accelerators to existing application codes. Efficiently using this support requires solving several challenges, related to performance, work partitioning, and concurrent execution on multiple device types. In this article, we discuss our experiences with using OpenMP for accelerators and present performance guidelines. We also introduce a library, Hydra, that addresses several of the challenges of using OpenMP for such devices. We apply Hydra to a scientific application, PlasCom2, that has not previously been able to use accelerators. Experiments on three architectures show that Hydra results in performance gains of up to 10× compared with CPU‐only execution. Concurrent execution on the host and GPU resulted in additional gains of up to 20% compared to running on the GPU only.
- Sponsoring Organization:
- USDOE
- OSTI ID:
- 1603688
- Journal Information:
- Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience Vol. 32 Journal Issue: 20; ISSN 1532-0626
- Publisher:
- Wiley Blackwell (John Wiley & Sons)Copyright Statement
- Country of Publication:
- United Kingdom
- Language:
- English
Web of Science
Self-Adaptive OmpSs Tasks in Heterogeneous Environments
|
conference | May 2013 |
Exploring Programming Multi-GPUs Using OpenMP and OpenACC-Based Hybrid Model
|
conference | May 2013 |
Efficient Fork-Join on GPUs Through Warp Specialization
|
conference | December 2017 |
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
|
journal | December 2014 |
XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures
|
conference | May 2013 |
The Spack package manager: bringing order to HPC software chaos
|
conference | January 2015 |
Improving the memory access locality of hybrid MPI applications
|
conference | January 2017 |
DawnCC: Automatic Annotation for Data Parallelism and Offloading
|
journal | May 2017 |
Chai: Collaborative heterogeneous applications for integrated-architectures
|
conference | April 2017 |
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
|
journal | November 2010 |
HPX: A Task Based Programming Model in a Global Address Space
|
conference | January 2014 |
Legion: Expressing locality and independence with logical regions
|
conference | November 2012 |
Performance analysis of OpenMP on a GPU using a CORAL proxy application
|
conference | January 2015 |
A uniform approach for programming distributed heterogeneous computing systems
|
journal | December 2014 |
A Unified Programming Model for Intra- and Inter-Node Offloading on Xeon Phi Clusters
|
conference | November 2014 |
Directive-based Programming Models for Scientific Applications - A Comparison
|
conference | November 2012 |
Hetero-mark, a benchmark suite for CPU-GPU collaborative computing
|
conference | September 2016 |
Similar Records
Python for Development of OpenMP and CUDA Kernels for Multidimensional Data
Getting To Exascale: Applying Novel Parallel Programming Models To Lab Applications For The Next Generation Of Supercomputers