Highly scalable distributed-memory sparse triangular solution algorithms.
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Scalable Solvers Group
This paper presents a highly efficient distributed-memory parallel sparse triangular solver. The triangular solution phase is often performed following factorization phase in the sparse linear solvers and has become increasingly computationally expensive for direct solvers with many right hand sides (RHSs) or preconditioned iterative solvers. However, the low arithmetic intensity and sequential nature of the triangular solve algorithm pose performance challenges for its large-scale distributed-memory parallelization. In this work, we propose several strategies to enhance scalability of an algorithm with 2D block cyclic process layout. First, an asynchronous binary-tree-based communication scheme implemented via non-blocking MPI functions is leveraged to broadcast partial solutions and reduce partial updates among a subset of processes for each block column and row of the triangular matrix, respectively. This scheme reduces message latency, improves communication load balance and significantly accelerates asynchronous execution of the triangular solve. In addition, efficient BLAS operations and threading implementations are exploited to accelerate local computations and further reduce process idle time. The proposed strategies are implemented in SuperLU_DIST and numerical experiments show up to 4.4x improvement with one right-hand side and up to 6.1x improvement with 50 right- hand sides on 4096 processes, compared to the current release. This is the first time that sparse triangular solution is demonstrated strong scaling on more than 4000 cores.
- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
- DOE Contract Number:
- AC02-05CH11231
- OSTI ID:
- 1602817
- Country of Publication:
- United States
- Language:
- English
Similar Records
A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems
An asynchronous parallel linear equation solution technique
A communication-avoiding 3D sparse triangular solver
Journal Article
·
Sun Aug 18 20:00:00 EDT 2019
· Journal of Parallel and Distributed Computing
·
OSTI ID:1559632
An asynchronous parallel linear equation solution technique
Conference
·
Sat Dec 30 23:00:00 EST 1995
·
OSTI ID:153092
A communication-avoiding 3D sparse triangular solver
Conference
·
Sat Jun 01 00:00:00 EDT 2019
·
OSTI ID:1558528