Distributedmemory lattice Hmatrix factorization
Abstract
We parallelize the LU factorization of a hierarchical lowrank matrix (Hmatrix) on a distributedmemory computer. This is much more difficult than the Hmatrixvector multiplication due to the dataflow of the factorization, and it is much harder than the parallelization of a dense matrix factorization due to the irregular hierarchical block structure of the matrix. Block lowrank (BLR) format gets rid of the hierarchy and simplifies the parallelization, often increasing concurrency. However, this comes at a price of losing the nearlinear complexity of the Hmatrix factorization. In this work, we propose to factorize the matrix using a “lattice Hmatrix” format that generalizes the BLR format by storing each of the blocks (both diagonals and offdiagonals) in the Hmatrix format. These blocks stored in the Hmatrix format are referred to as lattices. Thus, this lattice format aims to combine the parallel scalability of BLR factorization with the nearlinear complexity of Hmatrix factorization. We first compare factorization performances using the Hmatrix, BLR, and lattice Hmatrix formats under various conditions on a sharedmemory computer. Our performance results show that the lattice format has storage and computational complexities similar to those of the Hmatrix format, and hence a much lower cost of factorization than BLR.more »
 Authors:

 Sandia National Lab. (SNLNM), Albuquerque, NM (United States)
 The Univ. of Tokyo, Tokyo (Japan)
 Tokyo Inst. of Technology, Tokyo (Japan)
 The Univ. of Tennessee, Knoxville, TN (United States)
 Publication Date:
 Research Org.:
 Sandia National Lab. (SNLNM), Albuquerque, NM (United States)
 Sponsoring Org.:
 USDOE National Nuclear Security Administration (NNSA)
 OSTI Identifier:
 1559494
 Report Number(s):
 SAND20198102J
Journal ID: ISSN 10943420; 677691
 Grant/Contract Number:
 AC0494AL85000
 Resource Type:
 Accepted Manuscript
 Journal Name:
 International Journal of High Performance Computing Applications
 Additional Journal Information:
 Journal Volume: 33; Journal Issue: 5; Journal ID: ISSN 10943420
 Publisher:
 SAGE
 Country of Publication:
 United States
 Language:
 English
 Subject:
 97 MATHEMATICS AND COMPUTING; boundary element method; LU factorization; distributed memory; hierarchical matrix; task programming
Citation Formats
Yamazaki, Ichitaro, Ida, Akihiro, Yokota, Rio, and Dongarra, Jack. Distributedmemory lattice Hmatrix factorization. United States: N. p., 2019.
Web. doi:10.1177/1094342019861139.
Yamazaki, Ichitaro, Ida, Akihiro, Yokota, Rio, & Dongarra, Jack. Distributedmemory lattice Hmatrix factorization. United States. https://doi.org/10.1177/1094342019861139
Yamazaki, Ichitaro, Ida, Akihiro, Yokota, Rio, and Dongarra, Jack. Thu .
"Distributedmemory lattice Hmatrix factorization". United States. https://doi.org/10.1177/1094342019861139. https://www.osti.gov/servlets/purl/1559494.
@article{osti_1559494,
title = {Distributedmemory lattice Hmatrix factorization},
author = {Yamazaki, Ichitaro and Ida, Akihiro and Yokota, Rio and Dongarra, Jack},
abstractNote = {We parallelize the LU factorization of a hierarchical lowrank matrix (Hmatrix) on a distributedmemory computer. This is much more difficult than the Hmatrixvector multiplication due to the dataflow of the factorization, and it is much harder than the parallelization of a dense matrix factorization due to the irregular hierarchical block structure of the matrix. Block lowrank (BLR) format gets rid of the hierarchy and simplifies the parallelization, often increasing concurrency. However, this comes at a price of losing the nearlinear complexity of the Hmatrix factorization. In this work, we propose to factorize the matrix using a “lattice Hmatrix” format that generalizes the BLR format by storing each of the blocks (both diagonals and offdiagonals) in the Hmatrix format. These blocks stored in the Hmatrix format are referred to as lattices. Thus, this lattice format aims to combine the parallel scalability of BLR factorization with the nearlinear complexity of Hmatrix factorization. We first compare factorization performances using the Hmatrix, BLR, and lattice Hmatrix formats under various conditions on a sharedmemory computer. Our performance results show that the lattice format has storage and computational complexities similar to those of the Hmatrix format, and hence a much lower cost of factorization than BLR. In conclusion, we then compare the BLR and lattice Hmatrix factorization on distributedmemory computers. Our performance results demonstrate that compared with BLR, the lattice format with the lower cost of factorization may lead to faster factorization on the distributedmemory computer.},
doi = {10.1177/1094342019861139},
journal = {International Journal of High Performance Computing Applications},
number = 5,
volume = 33,
place = {United States},
year = {2019},
month = {8}
}
Figures / Tables:
Works referenced in this record:
Systematic Reduction of Data Movement in Algebraic Multigrid Solvers
conference, May 2013
 Gahvari, Hormozd; Gropp, William; Jordan, Kirk E.
 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
Application of Improved HMatrices in Micromagnetic Simulations of Spin Torque Oscillator
journal, March 2018
 Ida, Akihiro; Ataka, Tadashi; Takahashi, Yasuhito
 IEEE Transactions on Magnetics, Vol. 54, Issue 3
Improving Multifrontal Methods by Means of Block LowRank Representations
journal, January 2015
 Amestoy, Patrick; Ashcraft, Cleve; Boiteau, Olivier
 SIAM Journal on Scientific Computing, Vol. 37, Issue 3
Parallel Randomized and MatrixFree Direct Solvers for Large Structured Dense Linear Systems
journal, January 2016
 Liu, Xiao; Xia, Jianlin; de Hoop, Maarten V.
 SIAM Journal on Scientific Computing, Vol. 38, Issue 5
TaskParallel LU Factorization of Hierarchical Matrices Using OmpSs
conference, May 2017
 Aliaga, Jose I.; CarratalaSaez, Rocio; Kriemann, Ronald
 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
The adaptive crossapproximation technique for the 3D boundaryelement method
journal, March 2002
 Kurz, S.; Rain, O.; Rjasanow, S.
 IEEE Transactions on Magnetics, Vol. 38, Issue 2
BiCGSTAB: A Fast and Smoothly Converging Variant of BiCG for the Solution of Nonsymmetric Linear Systems
journal, March 1992
 van der Vorst, H. A.
 SIAM Journal on Scientific and Statistical Computing, Vol. 13, Issue 2
Software Framework for Parallel BEM Analyses with Hmatrices Using MPI and OpenMP
journal, January 2017
 Iwashita, Takeshi; Ida, Akihiro; Mifune, Takeshi
 Procedia Computer Science, Vol. 108
A class of parallel tiled linear algebra algorithms for multicore architectures
journal, January 2009
 Buttari, Alfredo; Langou, Julien; Kurzak, Jakub
 Parallel Computing, Vol. 35, Issue 1
Parallel Hierarchical Matrices with Adaptive Cross Approximation on Symmetric Multiprocessing Clusters
journal, January 2014
 Ida, Akihiro; Iwashita, Takeshi; Mifune, Takeshi
 Journal of Information Processing, Vol. 22, Issue 4
Adaptive LowRank Approximation of Collocation Matrices
journal, February 2003
 Bebendorf, M.; Rjasanow, S.
 Computing, Vol. 70, Issue 1
Geometryoblivious FMM for compressing dense SPD matrices
conference, January 2017
 Yu, Chenhan D.; Levitt, James; Reiz, Severin
 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on  SC '17
Efficient Scalable Algorithms for Solving Dense Linear Systems with Hierarchically Semiseparable Structures
journal, January 2013
 Wang, Shen; Li, Xiaoye S.; Xia, Jianlin
 SIAM Journal on Scientific Computing, Vol. 35, Issue 6
INVASKIT: A Parallel Fast Direct Solver for Kernel Matrices
conference, May 2016
 Yu, Chenhan D.; March, William B.; Xiao, Bo
 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
A Sparse Matrix Arithmetic Based on $\Cal H$ Matrices. Part I: Introduction to ${\Cal H}$ Matrices
journal, April 1999
 Hackbusch, W.
 Computing, Vol. 62, Issue 2
A Fast $ULV$ Decomposition Solver for Hierarchically Semiseparable Representations
journal, January 2006
 Chandrasekaran, S.; Gu, M.; Pals, T.
 SIAM Journal on Matrix Analysis and Applications, Vol. 28, Issue 3
Lattice HMatrices on DistributedMemory Systems
conference, May 2018
 Ida, Akihiro
 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
A Parallel Butterfly Algorithm
journal, January 2014
 Poulson, Jack; Demanet, Laurent; Maxwell, Nicholas
 SIAM Journal on Scientific Computing, Vol. 36, Issue 1
On the Complexity of the Block LowRank Multifrontal Factorization
journal, January 2017
 Amestoy, Patrick; Buttari, Alfredo; L'Excellent, JeanYves
 SIAM Journal on Scientific Computing, Vol. 39, Issue 4
A DistributedMemory Package for Dense Hierarchically SemiSeparable Matrix Computations Using Randomization
journal, June 2016
 Rouet, FrançoisHenry; Li, Xiaoye S.; Ghysels, Pieter
 ACM Transactions on Mathematical Software, Vol. 42, Issue 4
A Recursive Skeletonization Factorization Based on Strong Admissibility
journal, January 2017
 Minden, Victor; Ho, Kenneth L.; Damle, Anil
 Multiscale Modeling & Simulation, Vol. 15, Issue 2
Fast Hierarchical Solvers For Sparse Matrices Using Extended Sparsification and LowRank Approximation
journal, January 2017
 Pouransari, Hadi; Coulier, Pieter; Darve, Eric
 SIAM Journal on Scientific Computing, Vol. 39, Issue 3
Application of Hierarchical Matrices to LargeScale Electromagnetic Field Analyses of Coils Wound With Coated Conductors
journal, April 2018
 Tominaga, Naoki; Mifune, Takeshi; Ida, Akihiro
 IEEE Transactions on Applied Superconductivity, Vol. 28, Issue 3
Butterfly Factorization
journal, January 2015
 Li, Yingzhou; Yang, Haizhao; Martin, Eileen R.
 Multiscale Modeling & Simulation, Vol. 13, Issue 2
Porting the PLASMA Numerical Library to the OpenMP Standard
journal, June 2016
 YarKhan, Asim; Kurzak, Jakub; Luszczek, Piotr
 International Journal of Parallel Programming, Vol. 45, Issue 3
A fast algorithm for particle simulations
journal, December 1987
 Greengard, L.; Rokhlin, V.
 Journal of Computational Physics, Vol. 73, Issue 2
Parallel Hierarchical Matrices with Block Lowrank Representation on Distributed Memory Computer Systems
conference, January 2018
 Ida, Akihiro; Nakashima, Hiroshi; Kawai, Masatoshi
 Proceedings of the International Conference on High Performance Computing in AsiaPacific Region  HPC Asia 2018
Sweeping preconditioner for the Helmholtz equation: Hierarchical matrix representation
journal, February 2011
 Engquist, Björn; Ying, Lexing
 Communications on Pure and Applied Mathematics, Vol. 64, Issue 5
Parallel accelerated cyclic reduction preconditioner for threedimensional elliptic PDEs with variable coefficients
journal, December 2018
 Chávez, Gustavo; Turkiyyah, George; Zampini, Stefano
 Journal of Computational and Applied Mathematics, Vol. 344
A Fast Algorithm for Particle Simulations
journal, August 1997
 Greengard, L.; Rokhlin, V.
 Journal of Computational Physics, Vol. 135, Issue 2
Adaptive lowrank approximation of collocation matrices
collection, January 2001
 Bebendorf, Mario; Rjasanow, Sergej
 Universität des Saarlandes
Sweeping Preconditioner for the Helmholtz Equation: Hierarchical Matrix Representation
preprint, January 2010
 Engquist, Björn; Ying, Lexing
 arXiv
Fast hierarchical solvers for sparse matrices using extended sparsification and lowrank approximation
text, January 2015
 Pouransari, Hadi; Coulier, Pieter; Darve, Eric
 arXiv
A recursive skeletonization factorization based on strong admissibility
text, January 2016
 Minden, Victor; Ho, Kenneth L.; Damle, Anil
 arXiv
Parallel accelerated cyclic reduction preconditioner for threedimensional elliptic PDEs with variable coefficients
text, January 2017
 Chávez, Gustavo; Turkiyyah, George; Zampini, Stefano
 arXiv
Figures / Tables found in this record: