skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Panda: A Compiler Framework for Concurrent CPU $$+$$ GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

Journal Article · · International Journal of Parallel Programming

We present a new compiler framework for truly heterogeneous 3D stencil computation on GPU clusters. Our framework consists of a simple directive-based programming model and a tightly integrated source-to-source compiler. Annotated with a small number of directives, sequential stencil C codes can be automatically parallelized for large-scale GPU clusters. The most distinctive feature of the compiler is its capability to generate hybrid MPI$$+$$ CUDA$$+$$ OpenMP code that uses concurrent CPU$$+$$ GPU computing to unleash the full potential of powerful GPU clusters. The auto-generated hybrid codes hide the overhead of various data motion by overlapping them with computation. Test results on the Titan supercomputer and the Wilkes cluster show that auto-translated codes can achieve about 90 % of the performance of highly optimized handwritten codes, for both a simple stencil benchmark and a real-world application in cardiac modeling. The user-friendliness and performance of our domain-specific compiler framework allow harnessing the full power of GPU-accelerated supercomputing without painstaking coding effort.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1525220
Journal Information:
International Journal of Parallel Programming, Vol. 45, Issue 3; ISSN 0885-7458
Publisher:
SpringerCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 14 works
Citation information provided by
Web of Science

References (29)

An auto-tuning framework for parallel multicore stencil computations conference April 2010
High-performance code generation for stencil computations on GPU architectures conference January 2012
Mint: realizing CUDA performance in 3D stencil methods with annotated C conference January 2011
A Survey of CPU-GPU Heterogeneous Computing Techniques journal July 2015
CPU+GPU Programming of Stencil Computations for Resource-Efficient Use of GPU Clusters
  • Sourouri, Mohammed; Langguth, Johannes; Spiga, Filippo
  • 2015 IEEE 18th International Conference on Computational Science and Engineering (CSE) https://doi.org/10.1109/CSE.2015.33
conference October 2015
Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers
  • Maruyama, Naoya; Nomura, Tatsuo; Sato, Kento
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063398
conference January 2011
Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines
  • Ragan-Kelley, Jonathan; Barnes, Connelly; Adams, Andrew
  • Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation - PLDI '13 https://doi.org/10.1145/2491956.2462176
conference January 2013
PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures
  • Christen, Matthias; Schenk, Olaf; Burkhart, Helmar
  • Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2011.70
conference May 2011
A Study on Balancing Parallelism, Data Locality, and Recomputation in Existing PDE Solvers
  • Olschanowsky, Catherine; Strout, Michelle Mills; Guzik, Stephen
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.70
conference November 2014
Towards automatic translation of OpenMP to MPI conference January 2005
Understanding stencil code performance on multicore architectures conference January 2011
Auto-generation and auto-tuning of 3D stencil codes on GPU clusters conference January 2012
Early evaluation of directive-based GPU programming models for productive exascale computing
  • Lee, Seyong; Vetter, Jeffrey S.
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.51
conference November 2012
Abstract Machine Models and Proxy Architectures for Exascale Computing conference November 2014
Distributed memory code generation for mixed Irregular/Regular computations
  • Ravishankar, Mahesh; Dathathri, Roshan; Elango, Venmugil
  • Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015 https://doi.org/10.1145/2688500.2688515
conference January 2015
PARTANS: An autotuning framework for stencil computation on multi-GPU systems journal January 2013
Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems conference January 2009
OpenMPC: Extended OpenMP Programming and Tuning for GPUs
  • Lee, Seyong; Eigenmann, Rudolf
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.36
conference November 2010
High Performance Stencil Code Algorithms for GPGPUs journal January 2011
STELLA: a domain-specific tool for structured grid methods in weather and climate models
  • Gysi, Tobias; Osuna, Carlos; Fuhrer, Oliver
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807627
conference January 2015
Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer
  • Shimokawabe, Takashi; Aoki, Takayuki; Takaki, Tomohiro
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063388
conference January 2011
SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters conference January 2012
Hybrid Hexagonal/Classical Tiling for GPUs conference January 2014
Scalable Heterogeneous CPU-GPU Computations for Unstructured Tetrahedral Meshes journal July 2015
Optimization of geometric multigrid for emerging multi- and manycore processors
  • Williams, Samuel; Kalamkar, Dhiraj D.; Singh, Amik
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.85
conference November 2012
On the GPU Performance of 3D Stencil Computations Implemented in OpenCL book January 2013
Roofline: an insightful visual performance model for multicore architectures journal April 2009
Hybridizing S3D into an Exascale application using OpenACC: An approach for moving to multi-petaflops and beyond
  • Levesque, John M.; Sankaran, Ramanan; Grout, Ray
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.69
conference November 2012
High-Productivity Framework on GPU-Rich Supercomputers for Operational Weather Prediction Code ASUCA
  • Shimokawabe, Takashi; Aoki, Takayuki; Onodera, Naoyuki
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.26
conference November 2014

Cited By (1)

Domain-Specific Multi-Level IR Rewriting for GPU preprint January 2020

Figures / Tables (6)


Similar Records

Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer
Journal Article · Mon Dec 01 00:00:00 EST 2014 · Journal of Computational Physics · OSTI ID:1525220

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers
Journal Article · Wed Apr 05 00:00:00 EDT 2017 · Parallel Computing · OSTI ID:1525220

Multi-GPU Implementation of a 3D Finite Difference Time Domain Earthquake Code on Heterogeneous Supercomputers
Journal Article · Sat Jun 01 00:00:00 EDT 2013 · Procedia Computer Science · OSTI ID:1525220

Related Subjects