A communication-avoiding 3D sparse triangular solver
Abstract
We present a novel distributed memory algorithm to improve the strong scalability of the solution of a sparse triangular system. This operation appears in the solve phase of direct methods for solving general sparse linear systems, Ax = b. Our 3D sparse triangular solver employs several techniques, including a 3D MPI process grid, elimination tree parallelism, and data replication, all of which reduce the per-process communication when combined. We present analytical models to understand the communication cost of our algorithm and show that our 3D sparse triangular solver can reduce the per-process communication volume asymptotically by a factor of O(n1/4) and O(n1/6) for problems arising from the finite element discretizations of 2D "planar" and 3D "non-planar" PDEs, respectively. We implement our algorithm for use in SuperLU_DIST3D, using a hybrid MPI+OpenMP programming model. Our 3D triangular solve algorithm, when run on 12k cores of Cray XC30, outperforms the current state-of-the-art 2D algorithm by 7.2x for planar and 2.7x for the non-planar sparse matrices, respectively.
- Authors:
-
- ORNL
- Lawrence Berkeley National Laboratory (LBNL)
- Georgia Institute of Technology, Atlanta
- Publication Date:
- Research Org.:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- OSTI Identifier:
- 1558528
- DOE Contract Number:
- AC05-00OR22725
- Resource Type:
- Conference
- Resource Relation:
- Conference: International Conference on Supercomputing (ICS 2019) - Phoenix, Arizona, United States of America - 6/26/2019 8:00:00 AM-6/28/2019 8:00:00 AM
- Country of Publication:
- United States
- Language:
- English
Citation Formats
Sao, Piyush, Kannan, Ramakrishnan, Li, Xiaoye Sherry, and Vuduc, Richard. A communication-avoiding 3D sparse triangular solver. United States: N. p., 2019.
Web. doi:10.1145/3330345.3330357.
Sao, Piyush, Kannan, Ramakrishnan, Li, Xiaoye Sherry, & Vuduc, Richard. A communication-avoiding 3D sparse triangular solver. United States. https://doi.org/10.1145/3330345.3330357
Sao, Piyush, Kannan, Ramakrishnan, Li, Xiaoye Sherry, and Vuduc, Richard. 2019.
"A communication-avoiding 3D sparse triangular solver". United States. https://doi.org/10.1145/3330345.3330357. https://www.osti.gov/servlets/purl/1558528.
@article{osti_1558528,
title = {A communication-avoiding 3D sparse triangular solver},
author = {Sao, Piyush and Kannan, Ramakrishnan and Li, Xiaoye Sherry and Vuduc, Richard},
abstractNote = {We present a novel distributed memory algorithm to improve the strong scalability of the solution of a sparse triangular system. This operation appears in the solve phase of direct methods for solving general sparse linear systems, Ax = b. Our 3D sparse triangular solver employs several techniques, including a 3D MPI process grid, elimination tree parallelism, and data replication, all of which reduce the per-process communication when combined. We present analytical models to understand the communication cost of our algorithm and show that our 3D sparse triangular solver can reduce the per-process communication volume asymptotically by a factor of O(n1/4) and O(n1/6) for problems arising from the finite element discretizations of 2D "planar" and 3D "non-planar" PDEs, respectively. We implement our algorithm for use in SuperLU_DIST3D, using a hybrid MPI+OpenMP programming model. Our 3D triangular solve algorithm, when run on 12k cores of Cray XC30, outperforms the current state-of-the-art 2D algorithm by 7.2x for planar and 2.7x for the non-planar sparse matrices, respectively.},
doi = {10.1145/3330345.3330357},
url = {https://www.osti.gov/biblio/1558528},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {6}
}
Works referenced in this record:
Nested Dissection of a Regular Finite Element Mesh
journal, April 1973
- George, Alan
- SIAM Journal on Numerical Analysis, Vol. 10, Issue 2
Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication
conference, May 2016
- Koanantakool, Penporn; Azad, Ariful; Buluc, Aydin
- 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
On asynchronous iterations
journal, November 2000
- Frommer, Andreas; Szyld, Daniel B.
- Journal of Computational and Applied Mathematics, Vol. 123, Issue 1-2
Communication results for parallel sparse Cholesky factorization on a hypercube
journal, May 1989
- George, Alan; Liu, Joseph W. H.; Ng, Esmond
- Parallel Computing, Vol. 10, Issue 3
Structure-adaptive parallel solution of sparse triangular linear systems
journal, October 2014
- Totoni, Ehsan; Heath, Michael T.; Kale, Laxmikant V.
- Parallel Computing, Vol. 40, Issue 9
Convergence Models and Surprising Results for the Asynchronous Jacobi Method
conference, May 2018
- Chow, Edmond; Chow, Edmond
- 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Highly scalable parallel algorithms for sparse matrix factorization
journal, May 1997
- Gupta, A.; Karypis, G.; Kumar, V.
- IEEE Transactions on Parallel and Distributed Systems, Vol. 8, Issue 5
A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices
conference, May 2018
- Sao, Piyush; Li, Xiaoye Sherry; Vuduc, Richard
- 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Integrated Model, Batch, and Domain Parallelism in Training Neural Networks
conference, July 2018
- Gholami, Amir; Azad, Ariful; Jin, Peter
- Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures
Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear Equations
conference, May 2017
- Wicky, Tobias; Solomonik, Edgar; Hoefler, Torsten
- 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication
journal, January 2016
- Azad, Ariful; Ballard, Grey; Buluç, Aydin
- SIAM Journal on Scientific Computing, Vol. 38, Issue 6
Efficient Parallel Sparse Triangular Solution Using Selective Inversion
journal, March 1998
- Raghavan, Padma
- Parallel Processing Letters, Vol. 08, Issue 01
Trading Replication for Communication in Parallel Distributed-Memory Dense Solvers
journal, March 2002
- Irony, Dror; Toledo, Sivan
- Parallel Processing Letters, Vol. 12, Issue 01
A New Data-Mapping Scheme for Latency-Tolerant Distributed Sparse Triangular Solution
conference, January 2002
- Teranishi, K.; Raghavan, P.
- ACM/IEEE SC 2002 Conference (SC'02)
Parallel Algorithms for Sparse Linear Systems
journal, September 1991
- Heath, Michael T.; Ng, Esmond; Peyton, Barry W.
- SIAM Review, Vol. 33, Issue 3
Avoiding communication in sparse matrix computations
conference, April 2008
- Demmel, James; Hoemmen, Mark; Mohiyuddin, Marghoob
- Distributed Processing Symposium (IPDPS), 2008 IEEE International Symposium on Parallel and Distributed Processing