Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

Basu, Protonu; Williams, Samuel; Van Straalen, Brian; Oliker, Leonid; Colella, Phillip; Hall, Mary

doi:10.1016/j.parco.2017.04.002

Title: Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

Journal Article · Wed Apr 05 04:00:00 UTC 2017 · Parallel Computing

DOI: https://doi.org/10.1016/j.parco.2017.04.002 · OSTI ID:1379823

Basu, Protonu ^[1]; Williams, Samuel ^[1]; Van Straalen, Brian ^[1]; Oliker, Leonid ^[1]; Colella, Phillip ^[1]; Hall, Mary ^[2]

Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Univ. of Utah, Salt Lake City, UT (United States). School of Computing

GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. Thus, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and maintain two versions of their applications or frameworks. In this paper, we explore the use of a compiler-based autotuning framework based on CUDA-CHiLL to deliver not only portability, but also performance portability across CPU- and GPU-accelerated platforms for the geometric multigrid linear solvers found in many scientific applications. We also show that with autotuning we can attain near Roofline (a performance bound for a computation and target architecture) performance across the key operations in the miniGMG benchmark for both CPU- and GPU-based architectures as well as for a multiple stencil discretizations and smoothers. We show that our technology is readily interoperable with MPI resulting in performance at scale equal to that obtained via hand-optimized MPI+CUDA implementation.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)

Grant/Contract Number:: AC02-05CH11231; AC05-00OR22725

OSTI ID:: 1379823

Journal Information:: Parallel Computing, Journal Name: Parallel Computing Journal Issue: C Vol. 64; ISSN 0167-8191

Publisher:: ElsevierCopyright Statement

Country of Publication:: United States

Language:: English

References (5)

Improving the arithmetic intensity of multigrid with the help of polynomial smoothers: IMPROVING MULTIGRIDS ARITHMETIC INTENSITY Ghysels, P.; Kłosiewicz, P.; Vanroose, W. Numerical Linear Algebra with Applications, Vol. 19, Issue 2 https://doi.org/10.1002/nla.1808	journal	February 2012
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors Datta, Kaushik; Kamil, Shoaib; Williams, Samuel SIAM Review, Vol. 51, Issue 1 https://doi.org/10.1137/070693199	journal	February 2009
Roofline: an insightful visual performance model for multicore architectures Williams, Samuel; Waterman, Andrew; Patterson, David Communications of the ACM, Vol. 52, Issue 4 https://doi.org/10.1145/1498765.1498785	journal	April 2009
A script-based autotuning compiler system to generate high-performance CUDA code Khan, Malik; Basu, Protonu; Rudy, Gabe ACM Transactions on Architecture and Code Optimization, Vol. 9, Issue 4 https://doi.org/10.1145/2400682.2400690	journal	January 2013
Introducing a parallel cache oblivious blocking approach for the lattice Boltzmann method Zeiser, T.; Wellein, G.; Nitsure, A. Progress in Computational Fluid Dynamics, An International Journal, Vol. 8, Issue 1/2/3/4 https://doi.org/10.1504/PCFD.2008.018088	journal	January 2008

Cited By (4)

Accelerating Multigrid-based Hierarchical Scientific Data Refactoring on GPUs Chen, Jieyang; Wan, Lipeng; Liang, Xin arXiv https://doi.org/10.48550/arxiv.2007.04457	preprint	January 2020
Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight Ma, Wenjing; Ao, Yulong; Yang, Chao Cluster Computing, Vol. 23, Issue 2 https://doi.org/10.1007/s10586-019-02938-w	journal	May 2019
A Survey on Compiler Autotuning using Machine Learning Ashouri, Amir H.; Killian, William; Cavazos, John ACM Computing Surveys, Vol. 51, Issue 5 https://doi.org/10.1145/3197978	journal	January 2019
A Survey on Compiler Autotuning using Machine Learning Ashouri, Amir H.; Killian, William; Cavazos, John arXiv https://doi.org/10.48550/arxiv.1801.04405	text	January 2018

Similar Records

Panda: A Compiler Framework for Concurrent CPU $+$ GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

Journal Article · Wed Oct 05 04:00:00 UTC 2016 · International Journal of Parallel Programming · OSTI ID:1525220

Sourouri, Mohammed; Baden, Scott B.; Cai, Xing

Hands-on Performance Tuning of 3D Finite Difference Earthquake Simulation on GPU Fermi Chipset

Journal Article · Sat Jun 02 04:00:00 UTC 2012 · Procedia Computer Science · OSTI ID:1567289

Zhou, Jun; Unat, Didem; Choi, Dong Ju; +2 more

OpenACC to FPGA: A Framework for Directive-based High-Performance Reconfigurable Computing

Conference · Sun May 01 04:00:00 UTC 2016 · OSTI ID:1261388

Lee, Seyong; Kim, Jungwon; Vetter, Jeffrey S.

Related Subjects

97 MATHEMATICS AND COMPUTING
Autotuning
Compiler
GPU
Multigrid

Title: Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

Citation Formats

References (5)

Cited By (4)

Similar Records

Related Subjects