Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Climbing the Summit and Pushing the Frontier of Mixed Precision Benchmarks at Extreme Scale

Conference ·
The rise of machine learning (ML) applications and their use of mixed precision to perform interesting science are driving forces behind AI for science on HPC. The convergence of ML and HPC with mixed precision offers the possibility of transformational changes in computational science. The HPL-AI benchmark is designed to measure the performance of mixed precision arithmetic as opposed to the HPL benchmark which measures double precision performance. Pushing the limits of systems at extreme scale is nontrivial -little public literature explores optimization of mixed precision computations at this scale. In this work, we demonstrate how to scale up the HPL-AI benchmark on the pre-exascale Summit and exascale Frontier systems at the Oak Ridge Leadership Computing Facility (OLCF) with a cross-platform design. We present the implementation, performance results, and a guideline of optimization strategies employed for delivering portable performance on both AMD and NVIDIA GPUs at extreme scale.
Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE; USDOE Office of Science (SC)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1997799
Country of Publication:
United States
Language:
English

References (17)

Accelerating Linpack Performance with Mixed Precision Algorithm on CPU+GPGPU Heterogeneous Cluster conference June 2010
Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers conference November 2018
SLATE: design of a modern distributed and accelerated linear algebra library
  • Gates, Mark; Kurzak, Jakub; Charara, Ali
  • SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356223
conference November 2019
Implementation and Numerical Techniques for One EFlop/s HPL-AI Benchmark on Fugaku conference November 2020
Distributed-Memory Parallel Symmetric Nonnegative Matrix Factorization conference November 2020
AORSA full wave calculations of helicon waves in DIII-D and ITER journal April 2018
Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis journal August 2019
Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems
  • Haidar, Azzam; Bayraktar, Harun; Tomov, Stanimire
  • Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 476, Issue 2243 https://doi.org/10.1098/rspa.2020.0110
journal November 2020
The LINPACK Benchmark: past, present and future
  • Dongarra, Jack J.; Luszczek, Piotr; Petitet, Antoine
  • Concurrency and Computation: Practice and Experience, Vol. 15, Issue 9 https://doi.org/10.1002/cpe.728
journal January 2003
A Distributed Newton Method for Network Utility Maximization–I: Algorithm journal September 2013
Block Algorithms for Parallel Machines book January 1988
High Performance Dense Linear System Solver with Resilience to Multiple Soft Errors journal January 2012
Optimization of Collective Communication Operations in MPICH journal February 2005
Accuracy and Stability of Numerical Algorithms book January 2002
New implementation of high-level correlated methods using a general block tensor library for high-performance electronic structure calculations journal July 2013
A massively parallel tensor contraction framework for coupled-cluster computations journal December 2014
Parallel Algorithms for Dense Linear Algebra Computations journal March 1990

Similar Records

HPC Molecular Simulation Tries Out a New GPU: Experiences on Early AMD Test Systems for the Frontier Supercomputer
Conference · Wed Jun 01 00:00:00 EDT 2022 · OSTI ID:1883870

Optimizing Communication in 2D Grid-Based MPI Applications at Exascale
Conference · Fri Sep 01 00:00:00 EDT 2023 · OSTI ID:2438993

Comparative evaluation of deep learning workloads for leadership-class systems
Conference · Fri Oct 01 00:00:00 EDT 2021 · OSTI ID:1838972

Related Subjects