Highly scalable distributed-memory sparse triangular solution algorithms.
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Scalable Solvers Group
This paper presents a highly efficient distributed-memory parallel sparse triangular solver. The triangular solution phase is often performed following factorization phase in the sparse linear solvers and has become increasingly computationally expensive for direct solvers with many right hand sides (RHSs) or preconditioned iterative solvers. However, the low arithmetic intensity and sequential nature of the triangular solve algorithm pose performance challenges for its large-scale distributed-memory parallelization. In this work, we propose several strategies to enhance scalability of an algorithm with 2D block cyclic process layout. First, an asynchronous binary-tree-based communication scheme implemented via non-blocking MPI functions is leveraged to broadcast partial solutions and reduce partial updates among a subset of processes for each block column and row of the triangular matrix, respectively. This scheme reduces message latency, improves communication load balance and significantly accelerates asynchronous execution of the triangular solve. In addition, efficient BLAS operations and threading implementations are exploited to accelerate local computations and further reduce process idle time. The proposed strategies are implemented in SuperLU_DIST and numerical experiments show up to 4.4x improvement with one right-hand side and up to 6.1x improvement with 50 right- hand sides on 4096 processes, compared to the current release. This is the first time that sparse triangular solution is demonstrated strong scaling on more than 4000 cores.
- Research Organization:
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- DOE Contract Number:
- AC02-05CH11231
- OSTI ID:
- 1602817
- Resource Relation:
- Conference: SIAM Workshop on Combinatorial Scientific Computing, June 6-8, 2018, Bergen, Norway
- Country of Publication:
- United States
- Language:
- English
Similar Records
An asynchronous parallel linear equation solution technique
A communication-avoiding 3D sparse triangular solver