# A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices

## Abstract

We propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D sparse LU algorithm uses a three-dimensional MPI process grid, aggressively exploits elimination tree parallelism and trades off increased memory for reduced per-process communication. We also analyze the asymptotic improvements for planar graphs (e.g., from 2D grid or mesh domains) and certain non-planar graphs (specifically for 3D grids and meshes). For planar graphs with n vertices, our algorithm reduces communication volume asymptotically in n by a factor of O{log n} and latency by a factor of O{log n}. For non-planar cases, our algorithm can reduce the per-process communication volume by 3× and latency by O{n^1/3} times. In all cases, the memory needed to achieve these gains is a constant factor. We implemented our algorithm by extending the 2D data structure used in superLU. Our new 3D code achieves speedups up to 27× for planar graphs and up to 3.3× for non-planar graphs over the baseline 2D superLU when run on 24,000 cores of a Cray XC30.

- Authors:

- Publication Date:

- Research Org.:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)

- Sponsoring Org.:
- USDOE Office of Science (SC)

- OSTI Identifier:
- 1544235

- Resource Type:
- Conference

- Resource Relation:
- Conference: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 21-25 May 2018, Vancouver, BC, Canada

- Country of Publication:
- United States

- Language:
- English

### Citation Formats

```
Sao, Piyush, Li, Xiaoye Sherry, and Vuduc, Richard.
```*A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices*. United States: N. p., 2018.
Web. doi:10.1109/IPDPS.2018.00100.

```
Sao, Piyush, Li, Xiaoye Sherry, & Vuduc, Richard.
```*A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices*. United States. doi:10.1109/IPDPS.2018.00100.

```
Sao, Piyush, Li, Xiaoye Sherry, and Vuduc, Richard. Tue .
"A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices". United States. doi:10.1109/IPDPS.2018.00100.
```

```
@article{osti_1544235,
```

title = {A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices},

author = {Sao, Piyush and Li, Xiaoye Sherry and Vuduc, Richard},

abstractNote = {We propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D sparse LU algorithm uses a three-dimensional MPI process grid, aggressively exploits elimination tree parallelism and trades off increased memory for reduced per-process communication. We also analyze the asymptotic improvements for planar graphs (e.g., from 2D grid or mesh domains) and certain non-planar graphs (specifically for 3D grids and meshes). For planar graphs with n vertices, our algorithm reduces communication volume asymptotically in n by a factor of O{log n} and latency by a factor of O{log n}. For non-planar cases, our algorithm can reduce the per-process communication volume by 3× and latency by O{n^1/3} times. In all cases, the memory needed to achieve these gains is a constant factor. We implemented our algorithm by extending the 2D data structure used in superLU. Our new 3D code achieves speedups up to 27× for planar graphs and up to 3.3× for non-planar graphs over the baseline 2D superLU when run on 24,000 cores of a Cray XC30.},

doi = {10.1109/IPDPS.2018.00100},

journal = {},

number = ,

volume = ,

place = {United States},

year = {2018},

month = {5}

}