skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices

Abstract

We propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D sparse LU algorithm uses a three-dimensional MPI process grid, aggressively exploits elimination tree parallelism and trades off increased memory for reduced per-process communication. We also analyze the asymptotic improvements for planar graphs (e.g., from 2D grid or mesh domains) and certain non-planar graphs (specifically for 3D grids and meshes). For planar graphs with n vertices, our algorithm reduces communication volume asymptotically in n by a factor of O{log n} and latency by a factor of O{log n}. For non-planar cases, our algorithm can reduce the per-process communication volume by 3× and latency by O{n^1/3} times. In all cases, the memory needed to achieve these gains is a constant factor. We implemented our algorithm by extending the 2D data structure used in superLU. Our new 3D code achieves speedups up to 27× for planar graphs and up to 3.3× for non-planar graphs over the baseline 2D superLU when run on 24,000 cores of a Cray XC30.

Authors:
; ;
Publication Date:
Research Org.:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1544235
Resource Type:
Conference
Resource Relation:
Conference: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 21-25 May 2018, Vancouver, BC, Canada
Country of Publication:
United States
Language:
English

Citation Formats

Sao, Piyush, Li, Xiaoye Sherry, and Vuduc, Richard. A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices. United States: N. p., 2018. Web. doi:10.1109/IPDPS.2018.00100.
Sao, Piyush, Li, Xiaoye Sherry, & Vuduc, Richard. A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices. United States. doi:10.1109/IPDPS.2018.00100.
Sao, Piyush, Li, Xiaoye Sherry, and Vuduc, Richard. Tue . "A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices". United States. doi:10.1109/IPDPS.2018.00100.
@article{osti_1544235,
title = {A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices},
author = {Sao, Piyush and Li, Xiaoye Sherry and Vuduc, Richard},
abstractNote = {We propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D sparse LU algorithm uses a three-dimensional MPI process grid, aggressively exploits elimination tree parallelism and trades off increased memory for reduced per-process communication. We also analyze the asymptotic improvements for planar graphs (e.g., from 2D grid or mesh domains) and certain non-planar graphs (specifically for 3D grids and meshes). For planar graphs with n vertices, our algorithm reduces communication volume asymptotically in n by a factor of O{log n} and latency by a factor of O{log n}. For non-planar cases, our algorithm can reduce the per-process communication volume by 3× and latency by O{n^1/3} times. In all cases, the memory needed to achieve these gains is a constant factor. We implemented our algorithm by extending the 2D data structure used in superLU. Our new 3D code achieves speedups up to 27× for planar graphs and up to 3.3× for non-planar graphs over the baseline 2D superLU when run on 24,000 cores of a Cray XC30.},
doi = {10.1109/IPDPS.2018.00100},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {5}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: