DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Distributed-memory lattice H-matrix factorization

Abstract

We parallelize the LU factorization of a hierarchical low-rank matrix (H-matrix) on a distributed-memory computer. This is much more difficult than the H-matrix-vector multiplication due to the dataflow of the factorization, and it is much harder than the parallelization of a dense matrix factorization due to the irregular hierarchical block structure of the matrix. Block low-rank (BLR) format gets rid of the hierarchy and simplifies the parallelization, often increasing concurrency. However, this comes at a price of losing the near-linear complexity of the H-matrix factorization. In this work, we propose to factorize the matrix using a “lattice H-matrix” format that generalizes the BLR format by storing each of the blocks (both diagonals and off-diagonals) in the H-matrix format. These blocks stored in the H-matrix format are referred to as lattices. Thus, this lattice format aims to combine the parallel scalability of BLR factorization with the near-linear complexity of H-matrix factorization. We first compare factorization performances using the H-matrix, BLR, and lattice H-matrix formats under various conditions on a shared-memory computer. Our performance results show that the lattice format has storage and computational complexities similar to those of the H-matrix format, and hence a much lower cost of factorization than BLR.more » In conclusion, we then compare the BLR and lattice H-matrix factorization on distributed-memory computers. Our performance results demonstrate that compared with BLR, the lattice format with the lower cost of factorization may lead to faster factorization on the distributed-memory computer.« less

Authors:
ORCiD logo [1];  [2]; ORCiD logo [3];  [4]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  2. The Univ. of Tokyo, Tokyo (Japan)
  3. Tokyo Inst. of Technology, Tokyo (Japan)
  4. The Univ. of Tennessee, Knoxville, TN (United States)
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1559494
Report Number(s):
SAND-2019-8102J
Journal ID: ISSN 1094-3420; 677691
Grant/Contract Number:  
AC04-94AL85000
Resource Type:
Accepted Manuscript
Journal Name:
International Journal of High Performance Computing Applications
Additional Journal Information:
Journal Volume: 33; Journal Issue: 5; Journal ID: ISSN 1094-3420
Publisher:
SAGE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; boundary element method; LU factorization; distributed memory; hierarchical matrix; task programming

Citation Formats

Yamazaki, Ichitaro, Ida, Akihiro, Yokota, Rio, and Dongarra, Jack. Distributed-memory lattice H-matrix factorization. United States: N. p., 2019. Web. doi:10.1177/1094342019861139.
Yamazaki, Ichitaro, Ida, Akihiro, Yokota, Rio, & Dongarra, Jack. Distributed-memory lattice H-matrix factorization. United States. https://doi.org/10.1177/1094342019861139
Yamazaki, Ichitaro, Ida, Akihiro, Yokota, Rio, and Dongarra, Jack. Thu . "Distributed-memory lattice H-matrix factorization". United States. https://doi.org/10.1177/1094342019861139. https://www.osti.gov/servlets/purl/1559494.
@article{osti_1559494,
title = {Distributed-memory lattice H-matrix factorization},
author = {Yamazaki, Ichitaro and Ida, Akihiro and Yokota, Rio and Dongarra, Jack},
abstractNote = {We parallelize the LU factorization of a hierarchical low-rank matrix (H-matrix) on a distributed-memory computer. This is much more difficult than the H-matrix-vector multiplication due to the dataflow of the factorization, and it is much harder than the parallelization of a dense matrix factorization due to the irregular hierarchical block structure of the matrix. Block low-rank (BLR) format gets rid of the hierarchy and simplifies the parallelization, often increasing concurrency. However, this comes at a price of losing the near-linear complexity of the H-matrix factorization. In this work, we propose to factorize the matrix using a “lattice H-matrix” format that generalizes the BLR format by storing each of the blocks (both diagonals and off-diagonals) in the H-matrix format. These blocks stored in the H-matrix format are referred to as lattices. Thus, this lattice format aims to combine the parallel scalability of BLR factorization with the near-linear complexity of H-matrix factorization. We first compare factorization performances using the H-matrix, BLR, and lattice H-matrix formats under various conditions on a shared-memory computer. Our performance results show that the lattice format has storage and computational complexities similar to those of the H-matrix format, and hence a much lower cost of factorization than BLR. In conclusion, we then compare the BLR and lattice H-matrix factorization on distributed-memory computers. Our performance results demonstrate that compared with BLR, the lattice format with the lower cost of factorization may lead to faster factorization on the distributed-memory computer.},
doi = {10.1177/1094342019861139},
journal = {International Journal of High Performance Computing Applications},
number = 5,
volume = 33,
place = {United States},
year = {Thu Aug 01 00:00:00 EDT 2019},
month = {Thu Aug 01 00:00:00 EDT 2019}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 11 works
Citation information provided by
Web of Science

Figures / Tables:

Fig. 1 Fig. 1: Conceptual diagram of the construction of representative low-rank structured matrices using different partition structure M: (a) $\mathscr{M_H}$ for a general $\mathscr{H}$-matrix, (b) $\mathscr{M}$W for a Hierarchical Semi-Separable (HSS), (c) $\mathcal{M}$L for a Block Low-Rank (BLR), and (d) $\mathscr{M}$L$\mathscr{H}$for a lattice $\mathscr{H}$-matrix. Blank boxes show non-admissible blocks. Blocks inmore » light red indicate submatrices judged as lowrank, and blocks painted in deep red are remaining non-admissible blocks calculated as dense submatrices.« less

Save / Share:

Works referenced in this record:

Systematic Reduction of Data Movement in Algebraic Multigrid Solvers
conference, May 2013

  • Gahvari, Hormozd; Gropp, William; Jordan, Kirk E.
  • 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
  • DOI: 10.1109/IPDPSW.2013.164

Application of Improved H-Matrices in Micromagnetic Simulations of Spin Torque Oscillator
journal, March 2018

  • Ida, Akihiro; Ataka, Tadashi; Takahashi, Yasuhito
  • IEEE Transactions on Magnetics, Vol. 54, Issue 3
  • DOI: 10.1109/TMAG.2017.2763611

Improving Multifrontal Methods by Means of Block Low-Rank Representations
journal, January 2015

  • Amestoy, Patrick; Ashcraft, Cleve; Boiteau, Olivier
  • SIAM Journal on Scientific Computing, Vol. 37, Issue 3
  • DOI: 10.1137/120903476

Parallel Randomized and Matrix-Free Direct Solvers for Large Structured Dense Linear Systems
journal, January 2016

  • Liu, Xiao; Xia, Jianlin; de Hoop, Maarten V.
  • SIAM Journal on Scientific Computing, Vol. 38, Issue 5
  • DOI: 10.1137/15M1023774

Task-Parallel LU Factorization of Hierarchical Matrices Using OmpSs
conference, May 2017

  • Aliaga, Jose I.; Carratala-Saez, Rocio; Kriemann, Ronald
  • 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
  • DOI: 10.1109/IPDPSW.2017.124

The adaptive cross-approximation technique for the 3D boundary-element method
journal, March 2002

  • Kurz, S.; Rain, O.; Rjasanow, S.
  • IEEE Transactions on Magnetics, Vol. 38, Issue 2
  • DOI: 10.1109/20.996112

Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems
journal, March 1992

  • van der Vorst, H. A.
  • SIAM Journal on Scientific and Statistical Computing, Vol. 13, Issue 2
  • DOI: 10.1137/0913035

Software Framework for Parallel BEM Analyses with H-matrices Using MPI and OpenMP
journal, January 2017


A class of parallel tiled linear algebra algorithms for multicore architectures
journal, January 2009


Parallel Hierarchical Matrices with Adaptive Cross Approximation on Symmetric Multiprocessing Clusters
journal, January 2014

  • Ida, Akihiro; Iwashita, Takeshi; Mifune, Takeshi
  • Journal of Information Processing, Vol. 22, Issue 4
  • DOI: 10.2197/ipsjjip.22.642

Adaptive Low-Rank Approximation of Collocation Matrices
journal, February 2003


Geometry-oblivious FMM for compressing dense SPD matrices
conference, January 2017

  • Yu, Chenhan D.; Levitt, James; Reiz, Severin
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17
  • DOI: 10.1145/3126908.3126921

Efficient Scalable Algorithms for Solving Dense Linear Systems with Hierarchically Semiseparable Structures
journal, January 2013

  • Wang, Shen; Li, Xiaoye S.; Xia, Jianlin
  • SIAM Journal on Scientific Computing, Vol. 35, Issue 6
  • DOI: 10.1137/110848062

INV-ASKIT: A Parallel Fast Direct Solver for Kernel Matrices
conference, May 2016

  • Yu, Chenhan D.; March, William B.; Xiao, Bo
  • 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2016.12

A Fast $ULV$ Decomposition Solver for Hierarchically Semiseparable Representations
journal, January 2006

  • Chandrasekaran, S.; Gu, M.; Pals, T.
  • SIAM Journal on Matrix Analysis and Applications, Vol. 28, Issue 3
  • DOI: 10.1137/S0895479803436652

Lattice H-Matrices on Distributed-Memory Systems
conference, May 2018

  • Ida, Akihiro
  • 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2018.00049

A Parallel Butterfly Algorithm
journal, January 2014

  • Poulson, Jack; Demanet, Laurent; Maxwell, Nicholas
  • SIAM Journal on Scientific Computing, Vol. 36, Issue 1
  • DOI: 10.1137/130921544

On the Complexity of the Block Low-Rank Multifrontal Factorization
journal, January 2017

  • Amestoy, Patrick; Buttari, Alfredo; L'Excellent, Jean-Yves
  • SIAM Journal on Scientific Computing, Vol. 39, Issue 4
  • DOI: 10.1137/16M1077192

A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization
journal, June 2016

  • Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter
  • ACM Transactions on Mathematical Software, Vol. 42, Issue 4
  • DOI: 10.1145/2930660

A Recursive Skeletonization Factorization Based on Strong Admissibility
journal, January 2017

  • Minden, Victor; Ho, Kenneth L.; Damle, Anil
  • Multiscale Modeling & Simulation, Vol. 15, Issue 2
  • DOI: 10.1137/16M1095949

Fast Hierarchical Solvers For Sparse Matrices Using Extended Sparsification and Low-Rank Approximation
journal, January 2017

  • Pouransari, Hadi; Coulier, Pieter; Darve, Eric
  • SIAM Journal on Scientific Computing, Vol. 39, Issue 3
  • DOI: 10.1137/15M1046939

Application of Hierarchical Matrices to Large-Scale Electromagnetic Field Analyses of Coils Wound With Coated Conductors
journal, April 2018

  • Tominaga, Naoki; Mifune, Takeshi; Ida, Akihiro
  • IEEE Transactions on Applied Superconductivity, Vol. 28, Issue 3
  • DOI: 10.1109/TASC.2017.2780821

Butterfly Factorization
journal, January 2015

  • Li, Yingzhou; Yang, Haizhao; Martin, Eileen R.
  • Multiscale Modeling & Simulation, Vol. 13, Issue 2
  • DOI: 10.1137/15M1007173

Porting the PLASMA Numerical Library to the OpenMP Standard
journal, June 2016

  • YarKhan, Asim; Kurzak, Jakub; Luszczek, Piotr
  • International Journal of Parallel Programming, Vol. 45, Issue 3
  • DOI: 10.1007/s10766-016-0441-6

A fast algorithm for particle simulations
journal, December 1987


Parallel Hierarchical Matrices with Block Low-rank Representation on Distributed Memory Computer Systems
conference, January 2018

  • Ida, Akihiro; Nakashima, Hiroshi; Kawai, Masatoshi
  • Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region - HPC Asia 2018
  • DOI: 10.1145/3149457.3149477

Sweeping preconditioner for the Helmholtz equation: Hierarchical matrix representation
journal, February 2011

  • Engquist, Björn; Ying, Lexing
  • Communications on Pure and Applied Mathematics, Vol. 64, Issue 5
  • DOI: 10.1002/cpa.20358

Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients
journal, December 2018

  • Chávez, Gustavo; Turkiyyah, George; Zampini, Stefano
  • Journal of Computational and Applied Mathematics, Vol. 344
  • DOI: 10.1016/j.cam.2017.11.035

A Fast Algorithm for Particle Simulations
journal, August 1997


Adaptive low-rank approximation of collocation matrices
collection, January 2001


A recursive skeletonization factorization based on strong admissibility
text, January 2016