Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

OpenMP Target Task: Tasking and Target Offloading on Heterogeneous Systems

Conference ·

This work evaluated the use of OpenMP tasking with target GPU offloading as a potential solution for programming productivity and performance on heterogeneous systems. Also, it is proposed a new OpenMP specification to make the implementation of heterogeneous codes simpler by using OpenMP target task, which integrates both OpenMP tasking and target GPU offloading in a single OpenMP pragma. As a test case, the authors used one of the most popular and widely used Basic Linear Algebra Subprogram Level-3 routines: triangular solver (TRSM). To benefit from the heterogeneity of the current high-performance computing systems, the authors propose a different parallelization of the algorithm by using a nonuniform decomposition of the problem. This work used target GPU offloading inside OpenMP tasks to address the heterogeneity found in the hardware. This new approach can outperform the state-of-the-art algorithms, which use a uniform decomposition of the data, on both the CPU-only and hybrid CPU-GPU systems, reaching speedups of up to one order of magnitude. The performance that this approach achieves is faster than the IBM ESSL math library on CPU and competitive relative to a highly optimized heterogeneous CUDA version. One node of Oak Ridge National Laboratory’s supercomputer, Summit, was used for performance analysis.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE; USDOE Office of Science (SC)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1885285
Country of Publication:
United States
Language:
English

References (12)

StarPU: a unified platform for task scheduling on heterogeneous multicore architectures journal November 2010
BLAS-3 Optimized by OmpSs Regions (LASs Library) conference February 2019
MPI+OpenMP tasking scalability for multi-morphology simulations of the human brain journal May 2019
Self-Adaptive OmpSs Tasks in Heterogeneous Environments
  • Planas, Judit; Badia, Rosa M.; Ayguade, Eduard
  • 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2013.53
conference May 2013
Plasma journal May 2019
Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels
  • Haidar, Azzam; Ltaief, Hatem; Dongarra, Jack
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/2063384.2063394
conference November 2011
Accelerating Conjugate Gradient using OmpSs conference December 2019
An Improved Magma Gemm For Fermi Graphics Processing Units journal September 2010
OmpSs: A PROPOSAL FOR PROGRAMMING HETEROGENEOUS MULTI-CORE ARCHITECTURES journal June 2011
MPI+OpenMP Tasking Scalability for the Simulation of the Human Brain conference September 2018
A set of level 3 basic linear algebra subprograms journal March 1990
sLASs: A fully automatic auto-tuned linear algebra library based on OpenMP extensions implemented in OmpSs (LASs Library) journal April 2020

Similar Records

An OpenMP GPU-offload implementation of a non-equilibrium solidification cellular automata model for additive manufacturing
Journal Article · Wed Nov 23 23:00:00 EST 2022 · Computer Physics Communications · OSTI ID:1908088

Concepts for OpenMP Target Offload Resilience
Conference · Thu Aug 01 00:00:00 EDT 2019 · OSTI ID:1570122

Targeting GPUs with OpenMP directives on Summit: A simple and effective Fortran experience
Journal Article · Tue Aug 20 00:00:00 EDT 2019 · Parallel Computing · OSTI ID:1569391

Related Subjects