Heterogeneous computing with OpenMP and Hydra

Diener, Matthias; Kale, Laxmikant V.; Bodony, Daniel J.

doi:10.1002/cpe.5728

Title: Heterogeneous computing with OpenMP and Hydra

Journal Article · Sat Mar 07 00:00:00 EST 2020 · Concurrency and Computation. Practice and Experience

DOI:https://doi.org/10.1002/cpe.5728· OSTI ID:1603688

^[1]; Kale, Laxmikant V. ^[1]; Bodony, Daniel J. ^[1]

University of Illinois at Urbana‐Champaign Champaign Illinois USA

Summary High‐performance computing relies on accelerators (such as GPGPUs) to achieve fast execution of scientific applications. Traditionally, these accelerators have been programmed with specialized languages, such as CUDA or OpenCL. In recent years, OpenMP emerged as a promising alternative for supporting accelerators, providing advantages such as maintaining a single code base for the host and different accelerator types and providing a simple way to extend support for accelerators to existing application codes. Efficiently using this support requires solving several challenges, related to performance, work partitioning, and concurrent execution on multiple device types. In this article, we discuss our experiences with using OpenMP for accelerators and present performance guidelines. We also introduce a library, Hydra, that addresses several of the challenges of using OpenMP for such devices. We apply Hydra to a scientific application, PlasCom2, that has not previously been able to use accelerators. Experiments on three architectures show that Hydra results in performance gains of up to 10× compared with CPU‐only execution. Concurrent execution on the host and GPU resulted in additional gains of up to 20% compared to running on the GPU only.

View Accepted Manuscript (Publisher)

Cite

Export

Save

Sponsoring Organization:: USDOE

OSTI ID:: 1603688

Journal Information:: Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience Vol. 32 Journal Issue: 20; ISSN 1532-0626

Publisher:: Wiley Blackwell (John Wiley & Sons)Copyright Statement

Country of Publication:: United Kingdom

Language:: English

Citation Metrics:

Cited by: 2 works

Citation information provided by
Web of Science

References (17)

Self-Adaptive OmpSs Tasks in Heterogeneous Environments Planas, Judit; Badia, Rosa M.; Ayguade, Eduard 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2013.53	conference	May 2013
Exploring Programming Multi-GPUs Using OpenMP and OpenACC-Based Hybrid Model Xu, Rengan; Chandrasekaran, Sunita; Chapman, Barbara 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) https://doi.org/10.1109/IPDPSW.2013.263	conference	May 2013
Efficient Fork-Join on GPUs Through Warp Specialization Jacob, Arpith Chacko; Eichenberger, Alexandre E.; Sung, Hyojin 2017 IEEE 24th International Conference on High Performance Computing (HiPC) https://doi.org/10.1109/HiPC.2017.00048	conference	December 2017
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel Journal of Parallel and Distributed Computing, Vol. 74, Issue 12 https://doi.org/10.1016/j.jpdc.2014.07.003	journal	December 2014
XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures Gautier, Thierry; Lima, Joao V. F.; Maillard, Nicolas 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2013.66	conference	May 2013
The Spack package manager: bringing order to HPC software chaos Gamblin, Todd; LeGendre, Matthew; Collette, Michael R. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807623	conference	January 2015
Improving the memory access locality of hybrid MPI applications Diener, Matthias; White, Sam; Kale, Laxmikant V. Proceedings of the 24th European MPI Users' Group Meeting on - EuroMPI '17 https://doi.org/10.1145/3127024.3127038	conference	January 2017
DawnCC: Automatic Annotation for Data Parallelism and Offloading Mendonça, Gleison; Guimarães, Breno; Alves, Péricles ACM Transactions on Architecture and Code Optimization, Vol. 14, Issue 2 https://doi.org/10.1145/3084540	journal	May 2017
Chai: Collaborative heterogeneous applications for integrated-architectures Gomez-Luna, Juan; Hajj, Izzat El; Chang, Li-Wen 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) https://doi.org/10.1109/ISPASS.2017.7975269	conference	April 2017
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures Augonnet, Cédric; Thibault, Samuel; Namyst, Raymond Concurrency and Computation: Practice and Experience, Vol. 23, Issue 2 https://doi.org/10.1002/cpe.1631	journal	November 2010
HPX: A Task Based Programming Model in a Global Address Space Kaiser, Hartmut; Heller, Thomas; Adelstein-Lelbach, Bryce Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models - PGAS '14 https://doi.org/10.1145/2676870.2676883	conference	January 2014
Legion: Expressing locality and independence with logical regions Bauer, Michael; Treichler, Sean; Slaughter, Elliott 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.71	conference	November 2012
Performance analysis of OpenMP on a GPU using a CORAL proxy application Bercea, Gheorghe-Teodor; Appelhans, David; O'Brien, Kevin Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems - PMBS '15 https://doi.org/10.1145/2832087.2832089	conference	January 2015
A uniform approach for programming distributed heterogeneous computing systems Grasso, Ivan; Pellegrini, Simone; Cosenza, Biagio Journal of Parallel and Distributed Computing, Vol. 74, Issue 12 https://doi.org/10.1016/j.jpdc.2014.08.002	journal	December 2014
A Unified Programming Model for Intra- and Inter-Node Offloading on Xeon Phi Clusters Noack, Matthias; Wende, Florian; Steinke, Thomas SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.22	conference	November 2014
Directive-based Programming Models for Scientific Applications - A Comparison Xu, Rengan; Chandrasekaran, Sunita; Chapman, Barbara 2012 SC Companion: High-Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis https://doi.org/10.1109/SCC.2012.6522594	conference	November 2012
Hetero-mark, a benchmark suite for CPU-GPU collaborative computing Sun, Yifan; Gong, Xiang; Ziabari, Amir Kavyan 2016 IEEE International Symposium on Workload Characterization (IISWC) https://doi.org/10.1109/IISWC.2016.7581262	conference	September 2016

Similar Records

Tausch: A halo exchange library for large heterogeneous computing systems using MPI, OpenCL, and CUDA

Journal Article · Fri Sep 23 00:00:00 EDT 2022 · Parallel Computing · OSTI ID:1603688

Spies, Lukas; Bienz, Amanda; Moulton, John David; +2 more

Python for Development of OpenMP and CUDA Kernels for Multidimensional Data

Conference · Sat Jan 01 00:00:00 EST 2011 · OSTI ID:1603688

Bell, Zane W; Davidson, Gregory G; D'Azevedo, Ed F; +5 more

Getting To Exascale: Applying Novel Parallel Programming Models To Lab Applications For The Next Generation Of Supercomputers

Technical Report · Mon Sep 27 00:00:00 EDT 2010 · OSTI ID:1603688

Dube, Evi; Shereda, Charles; Nau, Lee; +1 more

Title: Heterogeneous computing with OpenMP and Hydra

Citation Formats

References (17)

Similar Records

Related Subjects