Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs

Kurzak, Jakub; Anzt, Hartwig; Gates, Mark; Dongarra, Jack

doi:10.1109/tpds.2015.2481890

Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs

Journal Article · Fri Jul 01 00:00:00 EDT 2016 · IEEE Transactions on Parallel and Distributed Systems

DOI:https://doi.org/10.1109/tpds.2015.2481890· OSTI ID:1565512

Kurzak, Jakub; Anzt, Hartwig; Gates, Mark; Dongarra, Jack

Many problems in engineering and scientific computing require the solution of a large number of small systems of linear equations. Due to their high processing power, Graphics Processing Units became an attractive target for this class of problems, and routines based on the LU and the QR factorization have been provided by NVIDIA in the cuBLAS library. This work addresses the situation where the systems of equations are symmetric positive definite. The paper describes the implementation and tuning of the kernels for the Cholesky factorization and the forward and backward substitution. Targeted workloads involve the solution of thousands of linear systems of the same size, where the focus is on matrix dimensions from 5 by 5 to 100 by 100. Due to the lack of a cuBLAS Cholesky factorization, execution rates of cuBLAS LU and cuBLAS QR are used for comparison against the proposed Cholesky factorization in this work. Execution rates of forward and backward substitution routines are compared to equivalent cuBLAS routines. Comparisons against optimized multicore implementations are also presented. Superior performance is reached in all cases.

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)

Sponsoring Organization:: USDOE Office of Science

DOE Contract Number:: SC0010042

OSTI ID:: 1565512

Journal Information:: IEEE Transactions on Parallel and Distributed Systems, Journal Name: IEEE Transactions on Parallel and Distributed Systems Journal Issue: 7 Vol. 27; ISSN 1045-9219

Publisher:: IEEE

Country of Publication:: United States

Language:: English

Similar Records

Batched matrix computations on hardware accelerators based on GPUs

Journal Article · Sun Feb 08 23:00:00 EST 2015 · International Journal of High Performance Computing Applications · OSTI ID:1361289

Towards Batched Linear Solvers on Accelerated Hardware Platforms

Book · Wed Dec 31 23:00:00 EST 2014 · OSTI ID:1261494

A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations

Book · Wed Dec 31 23:00:00 EST 2014 · OSTI ID:1261481

Related Subjects

Computer Science
Engineering

Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs

Citation Formats

Similar Records

Related Subjects