skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Heterogeneous computing with OpenMP and Hydra

Journal Article · · Concurrency and Computation. Practice and Experience
DOI:https://doi.org/10.1002/cpe.5728· OSTI ID:1603688

Summary High‐performance computing relies on accelerators (such as GPGPUs) to achieve fast execution of scientific applications. Traditionally, these accelerators have been programmed with specialized languages, such as CUDA or OpenCL. In recent years, OpenMP emerged as a promising alternative for supporting accelerators, providing advantages such as maintaining a single code base for the host and different accelerator types and providing a simple way to extend support for accelerators to existing application codes. Efficiently using this support requires solving several challenges, related to performance, work partitioning, and concurrent execution on multiple device types. In this article, we discuss our experiences with using OpenMP for accelerators and present performance guidelines. We also introduce a library, Hydra, that addresses several of the challenges of using OpenMP for such devices. We apply Hydra to a scientific application, PlasCom2, that has not previously been able to use accelerators. Experiments on three architectures show that Hydra results in performance gains of up to 10× compared with CPU‐only execution. Concurrent execution on the host and GPU resulted in additional gains of up to 20% compared to running on the GPU only.

Sponsoring Organization:
USDOE
OSTI ID:
1603688
Journal Information:
Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience Vol. 32 Journal Issue: 20; ISSN 1532-0626
Publisher:
Wiley Blackwell (John Wiley & Sons)Copyright Statement
Country of Publication:
United Kingdom
Language:
English
Citation Metrics:
Cited by: 2 works
Citation information provided by
Web of Science

References (17)

Self-Adaptive OmpSs Tasks in Heterogeneous Environments
  • Planas, Judit; Badia, Rosa M.; Ayguade, Eduard
  • 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2013.53
conference May 2013
Exploring Programming Multi-GPUs Using OpenMP and OpenACC-Based Hybrid Model
  • Xu, Rengan; Chandrasekaran, Sunita; Chapman, Barbara
  • 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) https://doi.org/10.1109/IPDPSW.2013.263
conference May 2013
Efficient Fork-Join on GPUs Through Warp Specialization conference December 2017
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns journal December 2014
XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures
  • Gautier, Thierry; Lima, Joao V. F.; Maillard, Nicolas
  • 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2013.66
conference May 2013
The Spack package manager: bringing order to HPC software chaos
  • Gamblin, Todd; LeGendre, Matthew; Collette, Michael R.
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807623
conference January 2015
Improving the memory access locality of hybrid MPI applications conference January 2017
DawnCC: Automatic Annotation for Data Parallelism and Offloading
  • Mendonça, Gleison; Guimarães, Breno; Alves, Péricles
  • ACM Transactions on Architecture and Code Optimization, Vol. 14, Issue 2 https://doi.org/10.1145/3084540
journal May 2017
Chai: Collaborative heterogeneous applications for integrated-architectures conference April 2017
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures journal November 2010
HPX: A Task Based Programming Model in a Global Address Space
  • Kaiser, Hartmut; Heller, Thomas; Adelstein-Lelbach, Bryce
  • Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models - PGAS '14 https://doi.org/10.1145/2676870.2676883
conference January 2014
Legion: Expressing locality and independence with logical regions
  • Bauer, Michael; Treichler, Sean; Slaughter, Elliott
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.71
conference November 2012
Performance analysis of OpenMP on a GPU using a CORAL proxy application
  • Bercea, Gheorghe-Teodor; Appelhans, David; O'Brien, Kevin
  • Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems - PMBS '15 https://doi.org/10.1145/2832087.2832089
conference January 2015
A uniform approach for programming distributed heterogeneous computing systems journal December 2014
A Unified Programming Model for Intra- and Inter-Node Offloading on Xeon Phi Clusters
  • Noack, Matthias; Wende, Florian; Steinke, Thomas
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.22
conference November 2014
Directive-based Programming Models for Scientific Applications - A Comparison
  • Xu, Rengan; Chandrasekaran, Sunita; Chapman, Barbara
  • 2012 SC Companion: High-Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis https://doi.org/10.1109/SCC.2012.6522594
conference November 2012
Hetero-mark, a benchmark suite for CPU-GPU collaborative computing conference September 2016