DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Numerical eigen-spectrum slicing, accurate orthogonal eigen-basis, and mixed-precision eigenvalue refinement using OpenMP data-dependent tasks and accelerator offload

Journal Article · · International Journal of High Performance Computing Applications
ORCiD logo [1];  [2];  [3];  [4];  [5]
  1. MIT Lincoln Lab, LLS, CMIT Lincoln Laboratory, Lexington, MA,USA, Innovative Computing Laboratory, University of Tennessee, Knoxville, TN, USA
  2. Technical Development, Synopsys, Inc., Sunnyvale, CA, USA
  3. ML Compilers and AI Accelerators, Meta, Inc., Menlo Park, CA,USA
  4. Innovative Computing Laboratory, University of Tennessee, Knoxville, TN, USA
  5. Innovative Computing Laboratory, University of Tennessee, Knoxville, TN, USA, Computer Science and Mathematics, Oak Ridge National Laboratory, Oak Ridge, TN,USA, Applied Mathematics, University of Manchester, Manchester,UK

Performing a variety of numerical computations efficiently and, at the same time, in a portable fashion requires both an overarching design followed by a number of implementation strategies. All of these are exemplified below as we present transitioning the PLASMA numerical library from relying on dependence-driven large tasks to achieving utilization of fine grain tasking and offload to hardware accelerators while keeping its core dependence sets: OpenMP source code pragmas and runtime for most system-level functionality and basic low-level numerical kernels provided directly by hardware vendors or open source projects with vendor contributions. We also present new algorithmic methods and their efficient parallel implementations including fine grained tasking for eigen-spectrum slicing and offload for mixed-precision eigenvalue refinement. We provide performance, scaling, and numerical results showing sizable gains over the available solutions from either the open source and vendor-provided packages.

Sponsoring Organization:
USDOE
OSTI ID:
2447519
Journal Information:
International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications Journal Issue: 6 Vol. 38; ISSN 1094-3420
Publisher:
SAGE PublicationsCopyright Statement
Country of Publication:
United States
Language:
English

References (25)

Parallel tiled QR factorization for multicore architectures journal September 2008
Rounding error analysis of the classical Gram-Schmidt orthogonalization process journal May 2005
Iterative refinement for symmetric eigenvalue decomposition journal May 2018
The loss of orthogonality in the Gram-Schmidt orthogonalization process journal October 2005
Particle partition entanglement of one dimensional spinless fermions journal August 2017
Eigenvalues of Density Matrices journal February 1961
Quantum Theory of Many-Particle Systems. I. Physical Interpretations by Means of Density Matrices, Natural Spin-Orbitals, and Convergence Problems in the Method of Configurational Interaction journal March 1955
Properties of Fermion Density Matrices journal July 1963
Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA
  • Bosilca, George; Bouteiller, Aurelien; Danalis, Anthony
  • Distributed Processing, Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum https://doi.org/10.1109/IPDPS.2011.299
conference May 2011
Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices Using Tile Algorithms on Multicore Architectures conference May 2011
A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction conference May 2012
PAQR: Pivoting Avoiding QR factorization conference May 2023
New Algorithm for Computing Eigenvectors of the Symmetric Eigenvalue Problem conference May 2014
Mixed-Precision Algorithm for Finding Selected Eigenvalues and Eigenvectors of Symmetric and Hermitian Matrices1 conference November 2022
Communication-optimal Parallel and Sequential QR and LU Factorizations journal January 2012
LAPACK Users' Guide software January 1999
Communication Avoiding Rank Revealing QR Factorization with Column Pivoting journal January 2015
A Divide-and-Conquer Algorithm for the Symmetric Tridiagonal Eigenproblem journal January 1995
Residual and Backward Error Bounds in Minimum Residual Krylov Subspace Methods journal January 2002
Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels
  • Haidar, Azzam; Ltaief, Hatem; Dongarra, Jack
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/2063384.2063394
conference November 2011
An improved parallel singular value algorithm and its implementation for multicore hardware
  • Haidar, Azzam; Kurzak, Jakub; Luszczek, Piotr
  • Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/2503210.2503292
conference November 2013
Plasma journal May 2019
SLATE: design of a modern distributed and accelerated linear algebra library
  • Gates, Mark; Kurzak, Jakub; Charara, Ali
  • SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356223
conference November 2019
Analiza numeryczna algorytmu ortogonalizacji Grama-Schmidta journal March 1974
Numerical behavior of NVIDIA tensor cores journal January 2021