Automatic Offloading C++ Expression Templates to CUDA Enabled GPUs

Chen, Jie; Joo, Balint; Watson, William A.; Edwards, Robert G.

doi:10.1109/IPDPSW.2012.293

Title: Automatic Offloading C++ Expression Templates to CUDA Enabled GPUs

Conference · Tue May 01 00:00:00 EDT 2012

DOI:https://doi.org/10.1109/IPDPSW.2012.293· OSTI ID:1080421

Chen, Jie ^[1]; Joo, Balint ^[1]; Watson, William A. ^[1]; Edwards, Robert G. ^[1]

JLAB

In the last few years, many scientific applications have been developed for powerful graphics processing units (GPUs) and have achieved remarkable speedups. This success can be partially attributed to high performance host callable GPU library routines that are offloaded to GPUs at runtime. These library routines are based on C/C++-like programming toolkits such as CUDA from NVIDIA and have the same calling signatures as their CPU counterparts. Recently, with the sufficient support of C++ templates from CUDA, the emergence of template libraries have enabled further advancement in code reusability and rapid software development for GPUs. However, Expression Templates (ET), which have been very popular for implementing data parallel scientific software for host CPUs because of their intuitive and mathematics-like syntax, have been underutilized by GPU development libraries. The lack of ET usage is caused by the difficulty of offloading expression templates from hosts to GPUs due to the inability to pass instantiated expressions to GPU kernels as well as the absence of the exact form of the expressions for the templates at the time of coding. This paper presents a general approach that enables automatic offloading of C++ expression templates to CUDA enabled GPUs by using the C++ metaprogramming technique and Just-In-Time (JIT) compilation methodology to generate and compile CUDA kernels for corresponding expression templates followed by executing the kernels with appropriate arguments. This approach allows developers to port applications to run on GPUs with virtually no code modifications. More specifically, this paper uses a large ET based data parallel physics library called QDP++ as an example to illustrate many aspects of the approach to offload expression templates automatically and to demonstrate very good speedups for typical QDP++ applications running on GPUs against running on CPUs using this method of offloading. In addition, this approach of automatic offlo- ding expression templates could be applied to other many-core accelerators that provide C++ programming toolkits with the support of C++ template.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Thomas Jefferson National Accelerator Facility (TJNAF), Newport News, VA (United States)

Sponsoring Organization:: USDOE SC Office of Advanced Scientific Computing Research (SC-21)

DOE Contract Number:: AC05-06OR23177

OSTI ID:: 1080421

Report Number(s):: JLAB-IT-12-01; DOE/OR/23177-2572

Resource Relation:: Conference: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 21-25 May 2012, Shanghai, China

Country of Publication:: United States

Language:: English

Similar Records

A Framework for Lattice QCD Calculations on GPUs

Conference · Fri Aug 01 00:00:00 EDT 2014 · OSTI ID:1080421

Winter, Frank; Clark, M A; Edwards, Robert G; +1 more

QDP-JIT/PTX: A QDP++ Implementation for CUDA-Enabled GPUs

Conference · Sat Nov 01 00:00:00 EDT 2014 · Proceedings of Science · OSTI ID:1080421

Winter, Frank T.; Edwards, Robert G.

Using Numba for GPU acceleration of Neutron Beamline Digital Twins

Conference · Tue Aug 01 00:00:00 EDT 2023 · OSTI ID:1080421

Kendrick, Coleman; Granroth, Garrett; Lin, Jiao

Related Subjects

97 MATHEMATICS AND COMPUTING

Title: Automatic Offloading C++ Expression Templates to CUDA Enabled GPUs

Citation Formats

Similar Records

Related Subjects