Scaling the memory wall using mixed-precision - HPG-MxP on an exascale-class machine
- ORNL
Mixed-precision algorithms have been proposed as a way for scientific computing to benefit from some of the gains seen for AI on recent high performance computing (HPC) platforms. A few applications dominated by dense matrix operations have seen substantial speedups by utilizing low precision formats such as FP16. However, a majority of scientific simulation applications are memory bandwidth limited. Beyond preliminary studies, the practical gain from using mixed-precision algorithms on a given high-performance computing (HPC) system is largely unclear. The High Performance GMRES Mixed Precision (HPG-MxP) benchmark has been proposed to measure the useful performance of a HPC system on sparse matrix-based mixed-precision applications. In this work, we present an implementation of the HPG-MxP benchmark for an exascale system and describe our algorithm enhancements. We show for the first time a speedup of 1.6x using a combination of double- and single-precision keeping the same residual level on modern GPU-based supercomputers.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21); USDOE
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 3005442
- Country of Publication:
- United States
- Language:
- English
Similar Records
A GPU accelerated mixed-precision Smoothed Particle Hydrodynamics framework with cell-based relative coordinates
Climbing the Summit and Pushing the Frontier of Mixed Precision Benchmarks at Extreme Scale