Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

MAGMA: Enabling exascale performance with accelerated BLAS and LAPACK for diverse GPU architectures

Journal Article · · International Journal of High Performance Computing Applications

MAGMA (Matrix Algebra for GPU and Multicore Architectures) is a pivotal open-source library in the landscape of GPU-enabled dense and sparse linear algebra computations. With a repertoire of approximately 750 numerical routines across four precisions, MAGMA is deeply ingrained in the DOE software stack, playing a crucial role in high-performance computing. Notable projects such as ExaConstit, HiOP, MARBL, and STRUMPACK, among others, directly harness the capabilities of MAGMA. In addition, the MAGMA development team has been acknowledged multiple times for contributing to the vendors’ numerical software stacks. Looking back over the time of the Exascale Computing Project (ECP), we highlight how MAGMA has adapted to recent changes in modern HPC systems, especially the growing gap between CPU and GPU compute capabilities, as well as the introduction of low precision arithmetic in modern GPUs. We also describe MAGMA’s direct impact on several ECP projects. Maintaining portable performance across NVIDIA and AMD GPUs, and with current efforts toward supporting Intel GPUs, MAGMA ensures its adaptability and relevance in the ever-evolving landscape of GPU architectures.

Sponsoring Organization:
USDOE
Grant/Contract Number:
NONE; AC02-06CH11357
OSTI ID:
2375895
Alternate ID(s):
OSTI ID: 2429364
Journal Information:
International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications; ISSN 1094-3420
Publisher:
SAGE PublicationsCopyright Statement
Country of Publication:
United States
Language:
English

References (68)

Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL book November 2020
Batch QR Factorization on GPUs: Design, Optimization, and Tuning book June 2022
Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs book January 2015
Performance, Design, and Autotuning of Batched GEMM for GPUs book June 2016
Euro-Par 2016: Parallel Processing book August 2016
Multicore and Accelerator Development for a Leadership-Class Stellar Astrophysics Code book January 2013
PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems journal February 2002
MFEM: A modular finite element methods library journal January 2021
Accelerating scientific computations with mixed precision algorithms journal December 2009
Batched one-sided factorizations of tiny matrices using GPUs: Challenges and countermeasures journal May 2018
Towards dense linear algebra for hybrid GPU accelerated manycore systems journal June 2010
Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing journal December 2010
Batched QR and SVD algorithms on GPUs with applications in hierarchical matrix compression journal May 2018
High performance sparse multifrontal solvers on modern GPUs journal May 2022
Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs journal January 2016
Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures journal January 2017
Automatic code generation for many-body electronic structure methods: the tensor contraction engine‡‡ journal January 2006
Exascale applications: skin in the game
  • Alexander, Francis; Almgren, Ann; Bell, John
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166 https://doi.org/10.1098/rsta.2019.0056
journal January 2020
Progressive Optimization of Batched LU Factorization on GPUs conference September 2019
Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs conference September 2020
A Predictive Model for Solving Small Linear Algebra Problems in GPU Registers
  • Anderson, Michael J.; Sheffield, David; Keutzer, Kurt
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.11
conference May 2012
Fast Batched Matrix Multiplication for Small Sizes Using Half-Precision Arithmetic on GPUs conference May 2019
Dense linear algebra solvers for multicore with GPU accelerators conference April 2010
On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures conference May 2016
RAJA: Portable Performance for Large-Scale Scientific Applications conference November 2019
Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers conference November 2018
Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers conference November 2022
LAPACK: A portable linear algebra library for high-performance computers conference January 1990
Batched Generation of Incomplete Sparse Approximate Inverses on GPUs conference November 2016
High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs conference November 2020
Autotuning GEMM Kernels for the Fermi GPU journal November 2012
Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs journal December 2018
Extending MAGMA Portability with OneAPI conference November 2022
Performance Portable Graphics Processing Unit Acceleration of a High-Order Finite Element Multiphysics Application journal February 2024
GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems journal July 1986
A Flexible Inner-Outer Preconditioned GMRES Algorithm journal March 1993
Applied Numerical Linear Algebra book January 1997
Rounding Errors in Algebraic Processes book January 1966
An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling journal January 2016
A New Analysis of Iterative Refinement and Its Application to Accurate Solution of Ill-Conditioned Sparse Linear Systems journal January 2017
Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions journal January 2018
Flexible Inner-Outer Krylov Subspace Methods journal January 2002
The Design and Use of Algorithms for Permuting Large Entries to the Diagonal of Sparse Matrices journal January 1999
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs journal January 1998
Inexact Preconditioned Conjugate Gradient Method with Inner-Outer Iteration journal January 1999
Brook for GPUs: stream computing on graphics hardware journal August 2004
Tools and techniques for performance---Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems) conference January 2006
Algorithm 887: CHOLMOD, Supernodal Sparse Cholesky Factorization and Update/Downdate journal October 2008
Optimizing symmetric dense matrix-vector multiplication on GPUs
  • Nath, Rajib; Tomov, Stanimire; Dong, Tingxing "Tim"
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/2063384.2063392
conference November 2011
Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioner Generation on GPUs
  • Anzt, Hartwig; Dongarra, Jack; Flegar, Goran
  • Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores https://doi.org/10.1145/3026937.3026940
conference February 2017
High-performance Cholesky factorization for GPU-only execution conference February 2017
Algorithm 980
  • Yeralan, Sencer Nuri; Davis, Timothy A.; Sid-Lakhdar, Wissam M.
  • ACM Transactions on Mathematical Software, Vol. 44, Issue 2 https://doi.org/10.1145/3065870
journal August 2017
Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs conference January 2017
Iterative Refinement in Floating Point journal April 1967
Uncertainty Quantification of Metal Additive Manufacturing Processing Conditions Through the use of Exascale Computing
  • Carson, Robert; Rolchigo, Matt; Coleman, John
  • Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis https://doi.org/10.1145/3624062.3624103
conference November 2023
SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems journal June 2003
Algorithm 832: UMFPACK V4.3---an unsymmetric-pattern multifrontal method journal June 2004
Umpire: Application-focused management and coordination of complex hierarchical memory journal May 2020
Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems journal November 2007
An Improved Magma Gemm For Fermi Graphics Processing Units journal September 2010
Batched matrix computations on hardware accelerators based on GPUs journal April 2014
Acceleration of GPU-based Krylov solvers via data transfer reduction journal April 2015
Scalability of high-performance PDE solvers journal June 2020
Efficient exascale discretizations: High-order finite element methods journal June 2021
ExaAM: Metal additive manufacturing simulation at the fidelity of the microstructure journal January 2022
Matrix-free approaches for GPU acceleration of a high-order finite element hydrodynamics application using MFEM, Umpire, and RAJA journal May 2022
Vectorization of a Multiprocessor Multifrontal Code journal September 1989
libCEED: Fast algebra for high-order element-based discretizations journal July 2021

Similar Records

A graphics processing unit accelerated sparse direct solver and preconditioner with block low rank compression
Journal Article · Mon Sep 30 00:00:00 EDT 2024 · International Journal of High Performance Computing Applications · OSTI ID:2499469

ExaSGD: 2022 Kernel Thrust Activities
Technical Report · Sun Nov 06 23:00:00 EST 2022 · OSTI ID:1897345

Related Subjects