DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Evolution of the SLATE linear algebra library

Journal Article · · International Journal of High Performance Computing Applications

SLATE (Software for Linear Algebra Targeting Exascale) is a distributed, dense linear algebra library targeting both CPU-only and GPU-accelerated systems, developed over the course of the Exascale Computing Project (ECP). While it began with several documents setting out its initial design, significant design changes occurred throughout its development. In some cases, these were anticipated: an early version used a simple consistency flag that was later replaced with a full-featured consistency protocol. In other cases, performance limitations and software and hardware changes prompted a redesign. Sequential communication tasks were parallelized; host-to-host MPI calls were replaced with GPU device-to-device MPI calls; more advanced algorithms such as Communication Avoiding LU and the Random Butterfly Transform (RBT) were introduced. Early choices that turned out to be cumbersome, error prone, or inflexible have been replaced with simpler, more intuitive, or more flexible designs. Applications have been a driving force, prompting a lighter weight queue class, nonuniform tile sizes, and more flexible MPI process grids. Of paramount importance has been building a portable library that works across several different GPU architectures – AMD, Intel, and NVIDIA – while keeping a clean and maintainable codebase. Here we explore the evolving design choices and their effects, both in terms of performance and software sustainability.

Sponsoring Organization:
USDOE
OSTI ID:
2479010
Journal Information:
International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications Journal Issue: 1 Vol. 39; ISSN 1094-3420
Publisher:
SAGE PublicationsCopyright Statement
Country of Publication:
United States
Language:
English

References (27)

Kokkos: Enabling manycore performance portability through polymorphic memory access patterns journal December 2014
An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems journal July 2014
Measurement and analysis of GPU-accelerated applications with HPCToolkit journal December 2021
High performance sparse multifrontal solvers on modern GPUs journal May 2022
From NWChem to NWChemEx: Evolving with the Computational Chemistry Landscape journal March 2021
The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science journal May 2014
Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA
  • Bosilca, George; Bouteiller, Aurelien; Danalis, Anthony
  • Distributed Processing, Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum https://doi.org/10.1109/IPDPS.2011.299
conference May 2011
Portable and Efficient Dense Linear Algebra in the Beginning of the Exascale Era conference November 2022
Communication Avoiding Gaussian elimination conference November 2008
Replacing Pivoting in Distributed Gaussian Elimination with Randomized Techniques conference November 2020
Threshold Pivoting for Dense LU Factorization conference November 2022
Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model journal January 2017
Communication-optimal Parallel and Sequential QR and LU Factorizations journal January 2012
LAPACK Users' Guide software January 1999
ScaLAPACK Users' Guide book January 1997
Parallel Matrix Multiplication: A Systematic Journey journal January 2016
Mixed-Precision Cholesky QR Factorization and Its Case Studies on Multicore CPU with Multiple GPUs journal January 2015
Shifted Cholesky QR for Computing the QR Factorization of Ill-Conditioned Matrices journal January 2020
Elemental: A New Framework for Distributed Memory Dense Matrix Computations journal February 2013
The Spack package manager: bringing order to HPC software chaos
  • Gamblin, Todd; LeGendre, Matthew; Collette, Michael R.
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807623
conference January 2015
Implementing a cache consistency protocol journal June 1985
SLATE: design of a modern distributed and accelerated linear algebra library
  • Gates, Mark; Kurzak, Jakub; Charara, Ali
  • SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356223
conference November 2019
Task-graph scheduling extensions for efficient synchronization and communication conference June 2021
Using Additive Modifications in LU Factorization Instead of Pivoting conference June 2023
Optimizing High-Performance Linpack for Exascale Accelerated Architectures
  • Chalmers, Noel; Kurzak, Jakub; Mcdougall, Damon
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3581784.3607066
conference November 2023
Boosting Earth System Model Outputs And Saving PetaBytes in their Storage Using Exascale Climate Emulators preprint January 2024
Matrix Computations book February 2013

Similar Records

A Survey of MPI Usage in the U.S. Exascale Computing Project
Technical Report · 2018 · OSTI ID:1462877

A survey of MPI usage in the US exascale computing project
Journal Article · 2018 · Concurrency and Computation. Practice and Experience · OSTI ID:1477440

Understanding the use of message passing interface in exascale proxy applications
Journal Article · 2020 · Concurrency and Computation. Practice and Experience · OSTI ID:1860774

Related Subjects