skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A communication-avoiding 3D sparse triangular solver

Abstract

We present a novel distributed memory algorithm to improve the strong scalability of the solution of a sparse triangular system. This operation appears in the solve phase of direct methods for solving general sparse linear systems, Ax = b. Our 3D sparse triangular solver employs several techniques, including a 3D MPI process grid, elimination tree parallelism, and data replication, all of which reduce the per-process communication when combined. We present analytical models to understand the communication cost of our algorithm and show that our 3D sparse triangular solver can reduce the per-process communication volume asymptotically by a factor of O(n1/4) and O(n1/6) for problems arising from the finite element discretizations of 2D "planar" and 3D "non-planar" PDEs, respectively. We implement our algorithm for use in SuperLU_DIST3D, using a hybrid MPI+OpenMP programming model. Our 3D triangular solve algorithm, when run on 12k cores of Cray XC30, outperforms the current state-of-the-art 2D algorithm by 7.2x for planar and 2.7x for the non-planar sparse matrices, respectively.

Authors:
ORCiD logo [1]; ORCiD logo [1];  [2];  [3]
  1. ORNL
  2. Lawrence Berkeley National Laboratory (LBNL)
  3. Georgia Institute of Technology, Atlanta
Publication Date:
Research Org.:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1558528
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: International Conference on Supercomputing (ICS 2019) - Phoenix, Arizona, United States of America - 6/26/2019 8:00:00 AM-6/28/2019 8:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Sao, Piyush, Kannan, Ramakrishnan, Li, Xiaoye Sherry, and Vuduc, Richard. A communication-avoiding 3D sparse triangular solver. United States: N. p., 2019. Web. doi:10.1145/3330345.3330357.
Sao, Piyush, Kannan, Ramakrishnan, Li, Xiaoye Sherry, & Vuduc, Richard. A communication-avoiding 3D sparse triangular solver. United States. https://doi.org/10.1145/3330345.3330357
Sao, Piyush, Kannan, Ramakrishnan, Li, Xiaoye Sherry, and Vuduc, Richard. 2019. "A communication-avoiding 3D sparse triangular solver". United States. https://doi.org/10.1145/3330345.3330357. https://www.osti.gov/servlets/purl/1558528.
@article{osti_1558528,
title = {A communication-avoiding 3D sparse triangular solver},
author = {Sao, Piyush and Kannan, Ramakrishnan and Li, Xiaoye Sherry and Vuduc, Richard},
abstractNote = {We present a novel distributed memory algorithm to improve the strong scalability of the solution of a sparse triangular system. This operation appears in the solve phase of direct methods for solving general sparse linear systems, Ax = b. Our 3D sparse triangular solver employs several techniques, including a 3D MPI process grid, elimination tree parallelism, and data replication, all of which reduce the per-process communication when combined. We present analytical models to understand the communication cost of our algorithm and show that our 3D sparse triangular solver can reduce the per-process communication volume asymptotically by a factor of O(n1/4) and O(n1/6) for problems arising from the finite element discretizations of 2D "planar" and 3D "non-planar" PDEs, respectively. We implement our algorithm for use in SuperLU_DIST3D, using a hybrid MPI+OpenMP programming model. Our 3D triangular solve algorithm, when run on 12k cores of Cray XC30, outperforms the current state-of-the-art 2D algorithm by 7.2x for planar and 2.7x for the non-planar sparse matrices, respectively.},
doi = {10.1145/3330345.3330357},
url = {https://www.osti.gov/biblio/1558528}, journal = {},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {6}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:

Works referenced in this record:

Nested Dissection of a Regular Finite Element Mesh
journal, April 1973


Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication
conference, May 2016


On asynchronous iterations
journal, November 2000


Communication results for parallel sparse Cholesky factorization on a hypercube
journal, May 1989


Structure-adaptive parallel solution of sparse triangular linear systems
journal, October 2014


Convergence Models and Surprising Results for the Asynchronous Jacobi Method
conference, May 2018


Highly scalable parallel algorithms for sparse matrix factorization
journal, May 1997


A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices
conference, May 2018


Integrated Model, Batch, and Domain Parallelism in Training Neural Networks
conference, July 2018


Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear Equations
conference, May 2017


Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication
journal, January 2016


Efficient Parallel Sparse Triangular Solution Using Selective Inversion
journal, March 1998


Trading Replication for Communication in Parallel Distributed-Memory Dense Solvers
journal, March 2002


A New Data-Mapping Scheme for Latency-Tolerant Distributed Sparse Triangular Solution
conference, January 2002


Parallel Algorithms for Sparse Linear Systems
journal, September 1991


Avoiding communication in sparse matrix computations
conference, April 2008

  • Demmel, James; Hoemmen, Mark; Mohiyuddin, Marghoob
  • Distributed Processing Symposium (IPDPS), 2008 IEEE International Symposium on Parallel and Distributed Processing
  • https://doi.org/10.1109/IPDPS.2008.4536305