DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Heterogeneous computing with OpenMP and Hydra

Abstract

Summary High‐performance computing relies on accelerators (such as GPGPUs) to achieve fast execution of scientific applications. Traditionally, these accelerators have been programmed with specialized languages, such as CUDA or OpenCL. In recent years, OpenMP emerged as a promising alternative for supporting accelerators, providing advantages such as maintaining a single code base for the host and different accelerator types and providing a simple way to extend support for accelerators to existing application codes. Efficiently using this support requires solving several challenges, related to performance, work partitioning, and concurrent execution on multiple device types. In this article, we discuss our experiences with using OpenMP for accelerators and present performance guidelines. We also introduce a library, Hydra, that addresses several of the challenges of using OpenMP for such devices. We apply Hydra to a scientific application, PlasCom2, that has not previously been able to use accelerators. Experiments on three architectures show that Hydra results in performance gains of up to 10× compared with CPU‐only execution. Concurrent execution on the host and GPU resulted in additional gains of up to 20% compared to running on the GPU only.

Authors:
ORCiD logo [1];  [1];  [1]
  1. University of Illinois at Urbana‐Champaign Champaign Illinois USA
Publication Date:
Sponsoring Org.:
USDOE
OSTI Identifier:
1603688
Resource Type:
Publisher's Accepted Manuscript
Journal Name:
Concurrency and Computation. Practice and Experience
Additional Journal Information:
Journal Name: Concurrency and Computation. Practice and Experience Journal Volume: 32 Journal Issue: 20; Journal ID: ISSN 1532-0626
Publisher:
Wiley Blackwell (John Wiley & Sons)
Country of Publication:
United Kingdom
Language:
English

Citation Formats

Diener, Matthias, Kale, Laxmikant V., and Bodony, Daniel J. Heterogeneous computing with OpenMP and Hydra. United Kingdom: N. p., 2020. Web. doi:10.1002/cpe.5728.
Diener, Matthias, Kale, Laxmikant V., & Bodony, Daniel J. Heterogeneous computing with OpenMP and Hydra. United Kingdom. https://doi.org/10.1002/cpe.5728
Diener, Matthias, Kale, Laxmikant V., and Bodony, Daniel J. Sat . "Heterogeneous computing with OpenMP and Hydra". United Kingdom. https://doi.org/10.1002/cpe.5728.
@article{osti_1603688,
title = {Heterogeneous computing with OpenMP and Hydra},
author = {Diener, Matthias and Kale, Laxmikant V. and Bodony, Daniel J.},
abstractNote = {Summary High‐performance computing relies on accelerators (such as GPGPUs) to achieve fast execution of scientific applications. Traditionally, these accelerators have been programmed with specialized languages, such as CUDA or OpenCL. In recent years, OpenMP emerged as a promising alternative for supporting accelerators, providing advantages such as maintaining a single code base for the host and different accelerator types and providing a simple way to extend support for accelerators to existing application codes. Efficiently using this support requires solving several challenges, related to performance, work partitioning, and concurrent execution on multiple device types. In this article, we discuss our experiences with using OpenMP for accelerators and present performance guidelines. We also introduce a library, Hydra, that addresses several of the challenges of using OpenMP for such devices. We apply Hydra to a scientific application, PlasCom2, that has not previously been able to use accelerators. Experiments on three architectures show that Hydra results in performance gains of up to 10× compared with CPU‐only execution. Concurrent execution on the host and GPU resulted in additional gains of up to 20% compared to running on the GPU only.},
doi = {10.1002/cpe.5728},
journal = {Concurrency and Computation. Practice and Experience},
number = 20,
volume = 32,
place = {United Kingdom},
year = {Sat Mar 07 00:00:00 EST 2020},
month = {Sat Mar 07 00:00:00 EST 2020}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
https://doi.org/10.1002/cpe.5728

Citation Metrics:
Cited by: 2 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Self-Adaptive OmpSs Tasks in Heterogeneous Environments
conference, May 2013

  • Planas, Judit; Badia, Rosa M.; Ayguade, Eduard
  • 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
  • DOI: 10.1109/IPDPS.2013.53

Exploring Programming Multi-GPUs Using OpenMP and OpenACC-Based Hybrid Model
conference, May 2013

  • Xu, Rengan; Chandrasekaran, Sunita; Chapman, Barbara
  • 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
  • DOI: 10.1109/IPDPSW.2013.263

Efficient Fork-Join on GPUs Through Warp Specialization
conference, December 2017

  • Jacob, Arpith Chacko; Eichenberger, Alexandre E.; Sung, Hyojin
  • 2017 IEEE 24th International Conference on High Performance Computing (HiPC)
  • DOI: 10.1109/HiPC.2017.00048

Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
journal, December 2014

  • Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel
  • Journal of Parallel and Distributed Computing, Vol. 74, Issue 12
  • DOI: 10.1016/j.jpdc.2014.07.003

XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures
conference, May 2013

  • Gautier, Thierry; Lima, Joao V. F.; Maillard, Nicolas
  • 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
  • DOI: 10.1109/IPDPS.2013.66

The Spack package manager: bringing order to HPC software chaos
conference, January 2015

  • Gamblin, Todd; LeGendre, Matthew; Collette, Michael R.
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15
  • DOI: 10.1145/2807591.2807623

Improving the memory access locality of hybrid MPI applications
conference, January 2017

  • Diener, Matthias; White, Sam; Kale, Laxmikant V.
  • Proceedings of the 24th European MPI Users' Group Meeting on - EuroMPI '17
  • DOI: 10.1145/3127024.3127038

DawnCC: Automatic Annotation for Data Parallelism and Offloading
journal, May 2017

  • Mendonça, Gleison; Guimarães, Breno; Alves, Péricles
  • ACM Transactions on Architecture and Code Optimization, Vol. 14, Issue 2
  • DOI: 10.1145/3084540

Chai: Collaborative heterogeneous applications for integrated-architectures
conference, April 2017

  • Gomez-Luna, Juan; Hajj, Izzat El; Chang, Li-Wen
  • 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
  • DOI: 10.1109/ISPASS.2017.7975269

StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
journal, November 2010

  • Augonnet, Cédric; Thibault, Samuel; Namyst, Raymond
  • Concurrency and Computation: Practice and Experience, Vol. 23, Issue 2
  • DOI: 10.1002/cpe.1631

HPX: A Task Based Programming Model in a Global Address Space
conference, January 2014

  • Kaiser, Hartmut; Heller, Thomas; Adelstein-Lelbach, Bryce
  • Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models - PGAS '14
  • DOI: 10.1145/2676870.2676883

Legion: Expressing locality and independence with logical regions
conference, November 2012

  • Bauer, Michael; Treichler, Sean; Slaughter, Elliott
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2012.71

Performance analysis of OpenMP on a GPU using a CORAL proxy application
conference, January 2015

  • Bercea, Gheorghe-Teodor; Appelhans, David; O'Brien, Kevin
  • Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems - PMBS '15
  • DOI: 10.1145/2832087.2832089

A uniform approach for programming distributed heterogeneous computing systems
journal, December 2014

  • Grasso, Ivan; Pellegrini, Simone; Cosenza, Biagio
  • Journal of Parallel and Distributed Computing, Vol. 74, Issue 12
  • DOI: 10.1016/j.jpdc.2014.08.002

A Unified Programming Model for Intra- and Inter-Node Offloading on Xeon Phi Clusters
conference, November 2014

  • Noack, Matthias; Wende, Florian; Steinke, Thomas
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2014.22

Directive-based Programming Models for Scientific Applications - A Comparison
conference, November 2012

  • Xu, Rengan; Chandrasekaran, Sunita; Chapman, Barbara
  • 2012 SC Companion: High-Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
  • DOI: 10.1109/SCC.2012.6522594

Hetero-mark, a benchmark suite for CPU-GPU collaborative computing
conference, September 2016

  • Sun, Yifan; Gong, Xiang; Ziabari, Amir Kavyan
  • 2016 IEEE International Symposium on Workload Characterization (IISWC)
  • DOI: 10.1109/IISWC.2016.7581262