skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

Journal Article · · Parallel Computing
 [1];  [1];  [1];  [1];  [1];  [2]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. Univ. of Utah, Salt Lake City, UT (United States). School of Computing

GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. Thus, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and maintain two versions of their applications or frameworks. In this paper, we explore the use of a compiler-based autotuning framework based on CUDA-CHiLL to deliver not only portability, but also performance portability across CPU- and GPU-accelerated platforms for the geometric multigrid linear solvers found in many scientific applications. We also show that with autotuning we can attain near Roofline (a performance bound for a computation and target architecture) performance across the key operations in the miniGMG benchmark for both CPU- and GPU-based architectures as well as for a multiple stencil discretizations and smoothers. We show that our technology is readily interoperable with MPI resulting in performance at scale equal to that obtained via hand-optimized MPI+CUDA implementation.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
Grant/Contract Number:
AC02-05CH11231; AC05-00OR22725
OSTI ID:
1379823
Alternate ID(s):
OSTI ID: 1397648
Journal Information:
Parallel Computing, Vol. 64, Issue C; ISSN 0167-8191
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 10 works
Citation information provided by
Web of Science

References (5)

Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors journal February 2009
Improving the arithmetic intensity of multigrid with the help of polynomial smoothers: IMPROVING MULTIGRIDS ARITHMETIC INTENSITY journal February 2012
A script-based autotuning compiler system to generate high-performance CUDA code journal January 2013
Roofline: an insightful visual performance model for multicore architectures journal April 2009
Introducing a parallel cache oblivious blocking approach for the lattice Boltzmann method journal January 2008

Cited By (4)

A Survey on Compiler Autotuning using Machine Learning journal January 2019
Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight journal May 2019
A Survey on Compiler Autotuning using Machine Learning text January 2018
Accelerating Multigrid-based Hierarchical Scientific Data Refactoring on GPUs preprint January 2020