skip to main content

SciTech ConnectSciTech Connect

Title: A Flexible CUDA LU-based Solver for Small, Batched Linear Systems

This chapter presents the implementation of a batched CUDA solver based on LU factorization for small linear systems. This solver may be used in applications such as reactive flow transport models, which apply the Newton-Raphson technique to linearize and iteratively solve the sets of non linear equations that represent the reactions for ten of thousands to millions of physical locations. The implementation exploits somewhat counterintuitive GPGPU programming techniques: it assigns the solution of a matrix (representing a system) to a single CUDA thread, does not exploit shared memory and employs dynamic memory allocation on the GPUs. These techniques enable our implementation to simultaneously solve sets of systems with over 100 equations and to employ LU decomposition with complete pivoting, providing the higher numerical accuracy required by certain applications. Other currently available solutions for batched linear solvers are limited by size and only support partial pivoting, although they may result faster in certain conditions. We discuss the code of our implementation and present a comparison with the other implementations, discussing the various tradeoffs in terms of performance and flexibility. This work will enable developers that need batched linear solvers to choose whichever implementation is more appropriate to the features and themore » requirements of their applications, and even to implement dynamic switching approaches that can choose the best implementation depending on the input data.« less
; ;
Publication Date:
OSTI Identifier:
Report Number(s):
DOE Contract Number:
Resource Type:
Resource Relation:
Related Information: Numerical Computations with GPUs, 87-101
V Kindratenko; Springer International Publishing, Cham, Switzerland.
Research Org:
Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
Sponsoring Org:
Country of Publication:
United States
GPUs; LU decomposition; bached linear solvers