## This content will become publicly available on August 1, 2020

# Distributed-memory lattice H-matrix factorization

## Abstract

We parallelize the LU factorization of a hierarchical low-rank matrix (H-matrix) on a distributed-memory computer. This is much more difficult than the H-matrix-vector multiplication due to the dataflow of the factorization, and it is much harder than the parallelization of a dense matrix factorization due to the irregular hierarchical block structure of the matrix. Block low-rank (BLR) format gets rid of the hierarchy and simplifies the parallelization, often increasing concurrency. However, this comes at a price of losing the near-linear complexity of the H-matrix factorization. In this work, we propose to factorize the matrix using a “lattice H-matrix” format that generalizes the BLR format by storing each of the blocks (both diagonals and off-diagonals) in the H-matrix format. These blocks stored in the H-matrix format are referred to as lattices. Thus, this lattice format aims to combine the parallel scalability of BLR factorization with the near-linear complexity of H-matrix factorization. We first compare factorization performances using the H-matrix, BLR, and lattice H-matrix formats under various conditions on a shared-memory computer. Our performance results show that the lattice format has storage and computational complexities similar to those of the H-matrix format, and hence a much lower cost of factorization than BLR.more »

- Authors:

- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- The Univ. of Tokyo, Tokyo (Japan)
- Tokyo Inst. of Technology, Tokyo (Japan)
- The Univ. of Tennessee, Knoxville, TN (United States)

- Publication Date:

- Research Org.:
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA)

- OSTI Identifier:
- 1559494

- Report Number(s):
- SAND-2019-8102J

Journal ID: ISSN 1094-3420; 677691

- Grant/Contract Number:
- AC04-94AL85000

- Resource Type:
- Accepted Manuscript

- Journal Name:
- International Journal of High Performance Computing Applications

- Additional Journal Information:
- Journal Volume: 33; Journal Issue: 5; Journal ID: ISSN 1094-3420

- Publisher:
- SAGE

- Country of Publication:
- United States

- Language:
- English

- Subject:
- 97 MATHEMATICS AND COMPUTING; boundary element method; LU factorization; distributed memory; hierarchical matrix; task programming

### Citation Formats

```
Yamazaki, Ichitaro, Ida, Akihiro, Yokota, Rio, and Dongarra, Jack. Distributed-memory lattice H-matrix factorization. United States: N. p., 2019.
Web. doi:10.1177/1094342019861139.
```

```
Yamazaki, Ichitaro, Ida, Akihiro, Yokota, Rio, & Dongarra, Jack. Distributed-memory lattice H-matrix factorization. United States. doi:10.1177/1094342019861139.
```

```
Yamazaki, Ichitaro, Ida, Akihiro, Yokota, Rio, and Dongarra, Jack. Thu .
"Distributed-memory lattice H-matrix factorization". United States. doi:10.1177/1094342019861139.
```

```
@article{osti_1559494,
```

title = {Distributed-memory lattice H-matrix factorization},

author = {Yamazaki, Ichitaro and Ida, Akihiro and Yokota, Rio and Dongarra, Jack},

abstractNote = {We parallelize the LU factorization of a hierarchical low-rank matrix (H-matrix) on a distributed-memory computer. This is much more difficult than the H-matrix-vector multiplication due to the dataflow of the factorization, and it is much harder than the parallelization of a dense matrix factorization due to the irregular hierarchical block structure of the matrix. Block low-rank (BLR) format gets rid of the hierarchy and simplifies the parallelization, often increasing concurrency. However, this comes at a price of losing the near-linear complexity of the H-matrix factorization. In this work, we propose to factorize the matrix using a “lattice H-matrix” format that generalizes the BLR format by storing each of the blocks (both diagonals and off-diagonals) in the H-matrix format. These blocks stored in the H-matrix format are referred to as lattices. Thus, this lattice format aims to combine the parallel scalability of BLR factorization with the near-linear complexity of H-matrix factorization. We first compare factorization performances using the H-matrix, BLR, and lattice H-matrix formats under various conditions on a shared-memory computer. Our performance results show that the lattice format has storage and computational complexities similar to those of the H-matrix format, and hence a much lower cost of factorization than BLR. In conclusion, we then compare the BLR and lattice H-matrix factorization on distributed-memory computers. Our performance results demonstrate that compared with BLR, the lattice format with the lower cost of factorization may lead to faster factorization on the distributed-memory computer.},

doi = {10.1177/1094342019861139},

journal = {International Journal of High Performance Computing Applications},

number = 5,

volume = 33,

place = {United States},

year = {2019},

month = {8}

}

Works referenced in this record:

##
Application of Improved H-Matrices in Micromagnetic Simulations of Spin Torque Oscillator

journal, March 2018

- Ida, Akihiro; Ataka, Tadashi; Takahashi, Yasuhito
- IEEE Transactions on Magnetics, Vol. 54, Issue 3

##
Improving Multifrontal Methods by Means of Block Low-Rank Representations

journal, January 2015

- Amestoy, Patrick; Ashcraft, Cleve; Boiteau, Olivier
- SIAM Journal on Scientific Computing, Vol. 37, Issue 3

##
Parallel Randomized and Matrix-Free Direct Solvers for Large Structured Dense Linear Systems

journal, January 2016

- Liu, Xiao; Xia, Jianlin; de Hoop, Maarten V.
- SIAM Journal on Scientific Computing, Vol. 38, Issue 5

##
The adaptive cross-approximation technique for the 3D boundary-element method

journal, March 2002

- Kurz, S.; Rain, O.; Rjasanow, S.
- IEEE Transactions on Magnetics, Vol. 38, Issue 2

##
Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems

journal, March 1992

- van der Vorst, H. A.
- SIAM Journal on Scientific and Statistical Computing, Vol. 13, Issue 2

##
Software Framework for Parallel BEM Analyses with H-matrices Using MPI and OpenMP

journal, January 2017

- Iwashita, Takeshi; Ida, Akihiro; Mifune, Takeshi
- Procedia Computer Science, Vol. 108

##
A class of parallel tiled linear algebra algorithms for multicore architectures

journal, January 2009

- Buttari, Alfredo; Langou, Julien; Kurzak, Jakub
- Parallel Computing, Vol. 35, Issue 1

##
Parallel Hierarchical Matrices with Adaptive Cross Approximation on Symmetric Multiprocessing Clusters

journal, January 2014

- Ida, Akihiro; Iwashita, Takeshi; Mifune, Takeshi
- Journal of Information Processing, Vol. 22, Issue 4

##
Adaptive Low-Rank Approximation of Collocation Matrices

journal, February 2003

- Bebendorf, M.; Rjasanow, S.
- Computing, Vol. 70, Issue 1

##
Efficient Scalable Algorithms for Solving Dense Linear Systems with Hierarchically Semiseparable Structures

journal, January 2013

- Wang, Shen; Li, Xiaoye S.; Xia, Jianlin
- SIAM Journal on Scientific Computing, Vol. 35, Issue 6

##
A Sparse Matrix Arithmetic Based on $\Cal H$ -Matrices. Part I: Introduction to ${\Cal H}$ -Matrices

journal, April 1999

- Hackbusch, W.
- Computing, Vol. 62, Issue 2

##
A Fast $ULV$ Decomposition Solver for Hierarchically Semiseparable Representations

journal, January 2006

- Chandrasekaran, S.; Gu, M.; Pals, T.
- SIAM Journal on Matrix Analysis and Applications, Vol. 28, Issue 3

##
A Parallel Butterfly Algorithm

journal, January 2014

- Poulson, Jack; Demanet, Laurent; Maxwell, Nicholas
- SIAM Journal on Scientific Computing, Vol. 36, Issue 1

##
On the Complexity of the Block Low-Rank Multifrontal Factorization

journal, January 2017

- Amestoy, Patrick; Buttari, Alfredo; L'Excellent, Jean-Yves
- SIAM Journal on Scientific Computing, Vol. 39, Issue 4

##
A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization

journal, June 2016

- Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter
- ACM Transactions on Mathematical Software, Vol. 42, Issue 4

##
A Recursive Skeletonization Factorization Based on Strong Admissibility

journal, January 2017

- Minden, Victor; Ho, Kenneth L.; Damle, Anil
- Multiscale Modeling & Simulation, Vol. 15, Issue 2

##
Fast Hierarchical Solvers For Sparse Matrices Using Extended Sparsification and Low-Rank Approximation

journal, January 2017

- Pouransari, Hadi; Coulier, Pieter; Darve, Eric
- SIAM Journal on Scientific Computing, Vol. 39, Issue 3

##
Application of Hierarchical Matrices to Large-Scale Electromagnetic Field Analyses of Coils Wound With Coated Conductors

journal, April 2018

- Tominaga, Naoki; Mifune, Takeshi; Ida, Akihiro
- IEEE Transactions on Applied Superconductivity, Vol. 28, Issue 3

##
Butterfly Factorization

journal, January 2015

- Li, Yingzhou; Yang, Haizhao; Martin, Eileen R.
- Multiscale Modeling & Simulation, Vol. 13, Issue 2

##
Porting the PLASMA Numerical Library to the OpenMP Standard

journal, June 2016

- YarKhan, Asim; Kurzak, Jakub; Luszczek, Piotr
- International Journal of Parallel Programming, Vol. 45, Issue 3

##
A fast algorithm for particle simulations

journal, December 1987

- Greengard, L.; Rokhlin, V.
- Journal of Computational Physics, Vol. 73, Issue 2

##
Sweeping preconditioner for the Helmholtz equation: Hierarchical matrix representation

journal, February 2011

- Engquist, Björn; Ying, Lexing
- Communications on Pure and Applied Mathematics, Vol. 64, Issue 5

##
Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients

journal, December 2018

- Chávez, Gustavo; Turkiyyah, George; Zampini, Stefano
- Journal of Computational and Applied Mathematics, Vol. 344