Distributed-memory lattice H-matrix factorization

Yamazaki, Ichitaro; Ida, Akihiro; Yokota, Rio; Dongarra, Jack

doi:10.1177/1094342019861139

Title: Distributed-memory lattice H-matrix factorization

Abstract

We parallelize the LU factorization of a hierarchical low-rank matrix (H-matrix) on a distributed-memory computer. This is much more difficult than the H-matrix-vector multiplication due to the dataflow of the factorization, and it is much harder than the parallelization of a dense matrix factorization due to the irregular hierarchical block structure of the matrix. Block low-rank (BLR) format gets rid of the hierarchy and simplifies the parallelization, often increasing concurrency. However, this comes at a price of losing the near-linear complexity of the H-matrix factorization. In this work, we propose to factorize the matrix using a “lattice H-matrix” format that generalizes the BLR format by storing each of the blocks (both diagonals and off-diagonals) in the H-matrix format. These blocks stored in the H-matrix format are referred to as lattices. Thus, this lattice format aims to combine the parallel scalability of BLR factorization with the near-linear complexity of H-matrix factorization. We first compare factorization performances using the H-matrix, BLR, and lattice H-matrix formats under various conditions on a shared-memory computer. Our performance results show that the lattice format has storage and computational complexities similar to those of the H-matrix format, and hence a much lower cost of factorization than BLR.more »« less

Authors:

^[1]; Ida, Akihiro ^[2];

^[3]; Dongarra, Jack ^[4]

Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
The Univ. of Tokyo, Tokyo (Japan)
Tokyo Inst. of Technology, Tokyo (Japan)
The Univ. of Tennessee, Knoxville, TN (United States)

Publication Date:: Thu Aug 01 00:00:00 EDT 2019

Research Org.:: Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

Sponsoring Org.:: USDOE National Nuclear Security Administration (NNSA)

OSTI Identifier:: 1559494

Report Number(s):: SAND-2019-8102J
Journal ID: ISSN 1094-3420; 677691

Grant/Contract Number:: AC04-94AL85000

Resource Type:: Accepted Manuscript

Journal Name:: International Journal of High Performance Computing Applications

Additional Journal Information:: Journal Volume: 33; Journal Issue: 5; Journal ID: ISSN 1094-3420

Publisher:: SAGE

Country of Publication:: United States

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING; boundary element method; LU factorization; distributed memory; hierarchical matrix; task programming

Citation Formats


                    Yamazaki, Ichitaro, Ida, Akihiro, Yokota, Rio, and Dongarra, Jack. Distributed-memory lattice H-matrix factorization.  United States: N. p., 2019. 
Web.  doi:10.1177/1094342019861139.

Copy to clipboard


                    Yamazaki, Ichitaro, Ida, Akihiro, Yokota, Rio, & Dongarra, Jack. Distributed-memory lattice H-matrix factorization.  United States.  https://doi.org/10.1177/1094342019861139

Copy to clipboard


                    Yamazaki, Ichitaro, Ida, Akihiro, Yokota, Rio, and Dongarra, Jack. Thu .  
"Distributed-memory lattice H-matrix factorization".  United States.  https://doi.org/10.1177/1094342019861139.  https://www.osti.gov/servlets/purl/1559494.

Copy to clipboard


                    
@article{osti_1559494,

  title        = {Distributed-memory lattice H-matrix factorization},

  author       = {Yamazaki, Ichitaro and Ida, Akihiro and Yokota, Rio and Dongarra, Jack},

  abstractNote = {We parallelize the LU factorization of a hierarchical low-rank matrix (H-matrix) on a distributed-memory computer. This is much more difficult than the H-matrix-vector multiplication due to the dataflow of the factorization, and it is much harder than the parallelization of a dense matrix factorization due to the irregular hierarchical block structure of the matrix. Block low-rank (BLR) format gets rid of the hierarchy and simplifies the parallelization, often increasing concurrency. However, this comes at a price of losing the near-linear complexity of the H-matrix factorization. In this work, we propose to factorize the matrix using a “lattice H-matrix” format that generalizes the BLR format by storing each of the blocks (both diagonals and off-diagonals) in the H-matrix format. These blocks stored in the H-matrix format are referred to as lattices. Thus, this lattice format aims to combine the parallel scalability of BLR factorization with the near-linear complexity of H-matrix factorization. We first compare factorization performances using the H-matrix, BLR, and lattice H-matrix formats under various conditions on a shared-memory computer. Our performance results show that the lattice format has storage and computational complexities similar to those of the H-matrix format, and hence a much lower cost of factorization than BLR. In conclusion, we then compare the BLR and lattice H-matrix factorization on distributed-memory computers. Our performance results demonstrate that compared with BLR, the lattice format with the lower cost of factorization may lead to faster factorization on the distributed-memory computer.},

  doi          = {10.1177/1094342019861139},

  journal      = {International Journal of High Performance Computing Applications},

  number       = 5,

  volume       = 33,

  place        = {United States},

  year         = {Thu Aug 01 00:00:00 EDT 2019},

  month        = {Thu Aug 01 00:00:00 EDT 2019}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Accepted Manuscript (DOE)

Publisher's Version of Record

https://doi.org/10.1177/1094342019861139

Other availability

Search WorldCat to find libraries that may hold this journal

Citation Metrics:

Cited by: 11 works

Citation information provided by
Web of Science

Figures / Tables:

Fig. 1: Conceptual diagram of the construction of representative low-rank structured matrices using different partition structure M: (a) $\mathscr{M_H}$ for a general $\mathscr{H}$-matrix, (b) $\mathscr{M}$_W for a Hierarchical Semi-Separable (HSS), (c) $\mathcal{M}$_L for a Block Low-Rank (BLR), and (d) $\mathscr{M}$_L$\mathscr{H}$for a lattice $\mathscr{H}$-matrix. Blank boxes show non-admissible blocks. Blocks inmore »

All figures and tables (24 total)

Save / Share:

Export Metadata

Save to My Library

Works referenced in this record:

Systematic Reduction of Data Movement in Algebraic Multigrid Solvers
conference, May 2013

Gahvari, Hormozd; Gropp, William; Jordan, Kirk E.
2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
DOI: 10.1109/IPDPSW.2013.164

Application of Improved H-Matrices in Micromagnetic Simulations of Spin Torque Oscillator
journal, March 2018

Ida, Akihiro; Ataka, Tadashi; Takahashi, Yasuhito
IEEE Transactions on Magnetics, Vol. 54, Issue 3
DOI: 10.1109/TMAG.2017.2763611

Improving Multifrontal Methods by Means of Block Low-Rank Representations
journal, January 2015

Amestoy, Patrick; Ashcraft, Cleve; Boiteau, Olivier
SIAM Journal on Scientific Computing, Vol. 37, Issue 3
DOI: 10.1137/120903476

Parallel Randomized and Matrix-Free Direct Solvers for Large Structured Dense Linear Systems
journal, January 2016

Liu, Xiao; Xia, Jianlin; de Hoop, Maarten V.
SIAM Journal on Scientific Computing, Vol. 38, Issue 5
DOI: 10.1137/15M1023774

Task-Parallel LU Factorization of Hierarchical Matrices Using OmpSs
conference, May 2017

Aliaga, Jose I.; Carratala-Saez, Rocio; Kriemann, Ronald
2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
DOI: 10.1109/IPDPSW.2017.124

The adaptive cross-approximation technique for the 3D boundary-element method
journal, March 2002

Kurz, S.; Rain, O.; Rjasanow, S.
IEEE Transactions on Magnetics, Vol. 38, Issue 2
DOI: 10.1109/20.996112

Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems
journal, March 1992

van der Vorst, H. A.
SIAM Journal on Scientific and Statistical Computing, Vol. 13, Issue 2
DOI: 10.1137/0913035

Software Framework for Parallel BEM Analyses with H-matrices Using MPI and OpenMP
journal, January 2017

Iwashita, Takeshi; Ida, Akihiro; Mifune, Takeshi
Procedia Computer Science, Vol. 108
DOI: 10.1016/j.procs.2017.05.263

A class of parallel tiled linear algebra algorithms for multicore architectures
journal, January 2009

Buttari, Alfredo; Langou, Julien; Kurzak, Jakub
Parallel Computing, Vol. 35, Issue 1
DOI: 10.1016/j.parco.2008.10.002

Parallel Hierarchical Matrices with Adaptive Cross Approximation on Symmetric Multiprocessing Clusters
journal, January 2014

Ida, Akihiro; Iwashita, Takeshi; Mifune, Takeshi
Journal of Information Processing, Vol. 22, Issue 4
DOI: 10.2197/ipsjjip.22.642

Adaptive Low-Rank Approximation of Collocation Matrices
journal, February 2003

Bebendorf, M.; Rjasanow, S.
Computing, Vol. 70, Issue 1
DOI: 10.1007/s00607-002-1469-6

Geometry-oblivious FMM for compressing dense SPD matrices
conference, January 2017

Yu, Chenhan D.; Levitt, James; Reiz, Severin
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17
DOI: 10.1145/3126908.3126921

Efficient Scalable Algorithms for Solving Dense Linear Systems with Hierarchically Semiseparable Structures
journal, January 2013

Wang, Shen; Li, Xiaoye S.; Xia, Jianlin
SIAM Journal on Scientific Computing, Vol. 35, Issue 6
DOI: 10.1137/110848062

INV-ASKIT: A Parallel Fast Direct Solver for Kernel Matrices
conference, May 2016

Yu, Chenhan D.; March, William B.; Xiao, Bo
2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
DOI: 10.1109/IPDPS.2016.12

A Sparse Matrix Arithmetic Based on $\Cal H$ -Matrices. Part I: Introduction to ${\Cal H}$ -Matrices
journal, April 1999

Hackbusch, W.
Computing, Vol. 62, Issue 2
DOI: 10.1007/s006070050015

A Fast $ULV$ Decomposition Solver for Hierarchically Semiseparable Representations
journal, January 2006

Chandrasekaran, S.; Gu, M.; Pals, T.
SIAM Journal on Matrix Analysis and Applications, Vol. 28, Issue 3
DOI: 10.1137/S0895479803436652

Lattice H-Matrices on Distributed-Memory Systems
conference, May 2018

Ida, Akihiro
2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
DOI: 10.1109/IPDPS.2018.00049

A Parallel Butterfly Algorithm
journal, January 2014

Poulson, Jack; Demanet, Laurent; Maxwell, Nicholas
SIAM Journal on Scientific Computing, Vol. 36, Issue 1
DOI: 10.1137/130921544

On the Complexity of the Block Low-Rank Multifrontal Factorization
journal, January 2017

Amestoy, Patrick; Buttari, Alfredo; L'Excellent, Jean-Yves
SIAM Journal on Scientific Computing, Vol. 39, Issue 4
DOI: 10.1137/16M1077192

A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization
journal, June 2016

Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter
ACM Transactions on Mathematical Software, Vol. 42, Issue 4
DOI: 10.1145/2930660

A Recursive Skeletonization Factorization Based on Strong Admissibility
journal, January 2017

Minden, Victor; Ho, Kenneth L.; Damle, Anil
Multiscale Modeling & Simulation, Vol. 15, Issue 2
DOI: 10.1137/16M1095949

Fast Hierarchical Solvers For Sparse Matrices Using Extended Sparsification and Low-Rank Approximation
journal, January 2017

Pouransari, Hadi; Coulier, Pieter; Darve, Eric
SIAM Journal on Scientific Computing, Vol. 39, Issue 3
DOI: 10.1137/15M1046939

Application of Hierarchical Matrices to Large-Scale Electromagnetic Field Analyses of Coils Wound With Coated Conductors
journal, April 2018

Tominaga, Naoki; Mifune, Takeshi; Ida, Akihiro
IEEE Transactions on Applied Superconductivity, Vol. 28, Issue 3
DOI: 10.1109/TASC.2017.2780821

Butterfly Factorization
journal, January 2015

Li, Yingzhou; Yang, Haizhao; Martin, Eileen R.
Multiscale Modeling & Simulation, Vol. 13, Issue 2
DOI: 10.1137/15M1007173

Porting the PLASMA Numerical Library to the OpenMP Standard
journal, June 2016

YarKhan, Asim; Kurzak, Jakub; Luszczek, Piotr
International Journal of Parallel Programming, Vol. 45, Issue 3
DOI: 10.1007/s10766-016-0441-6

A fast algorithm for particle simulations
journal, December 1987

Greengard, L.; Rokhlin, V.
Journal of Computational Physics, Vol. 73, Issue 2
DOI: 10.1016/0021-9991(87)90140-9

Parallel Hierarchical Matrices with Block Low-rank Representation on Distributed Memory Computer Systems
conference, January 2018

Ida, Akihiro; Nakashima, Hiroshi; Kawai, Masatoshi
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region - HPC Asia 2018
DOI: 10.1145/3149457.3149477

Sweeping preconditioner for the Helmholtz equation: Hierarchical matrix representation
journal, February 2011

Engquist, Björn; Ying, Lexing
Communications on Pure and Applied Mathematics, Vol. 64, Issue 5
DOI: 10.1002/cpa.20358

Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients
journal, December 2018

Chávez, Gustavo; Turkiyyah, George; Zampini, Stefano
Journal of Computational and Applied Mathematics, Vol. 344
DOI: 10.1016/j.cam.2017.11.035

A Fast Algorithm for Particle Simulations
journal, August 1997

Greengard, L.; Rokhlin, V.
Journal of Computational Physics, Vol. 135, Issue 2
DOI: 10.1006/jcph.1997.5706

Adaptive low-rank approximation of collocation matrices
collection, January 2001

Bebendorf, Mario; Rjasanow, Sergej
Universität des Saarlandes
DOI: 10.22028/d291-26177

Sweeping Preconditioner for the Helmholtz Equation: Hierarchical Matrix Representation
preprint, January 2010

Engquist, Björn; Ying, Lexing
arXiv
DOI: 10.48550/arxiv.1007.4290

Fast hierarchical solvers for sparse matrices using extended sparsification and low-rank approximation
text, January 2015

Pouransari, Hadi; Coulier, Pieter; Darve, Eric
arXiv
DOI: 10.48550/arxiv.1510.07363

A recursive skeletonization factorization based on strong admissibility
text, January 2016

Minden, Victor; Ho, Kenneth L.; Damle, Anil
arXiv
DOI: 10.48550/arxiv.1609.08130

Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients
text, January 2017

Chávez, Gustavo; Turkiyyah, George; Zampini, Stefano
arXiv
DOI: 10.48550/arxiv.1712.08872

Figures / Tables found in this record:

Figures/Tables have been extracted from DOE-funded journal article accepted manuscripts.

Similar Records in DOE PAGES and OSTI.GOV collections:

An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling

Journal Article Ghysels, Pieter ; Li, Xiaoye S. ; Rouet, Francois -Henry ; ... - SIAM Journal on Scientific Computing

We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factorization leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to sevenfold for problems in our test suite. The implementation targetsmore »« less
Cited by 68
https://doi.org/10.1137/15M1010117

Full Text Available
Sparse Approximate Multifrontal Factorization with Composite Compression Methods

Journal Article Claus, Lisa ; Ghysels, Pieter ; Liu, Yang ; ... - ACM Transactions on Mathematical Software

This article presents a fast and approximate multifrontal solver for large sparse linear systems. In a recent work by Liu et al., we showed the efficiency of a multifrontal solver leveraging the butterfly algorithm and its hierarchical matrix extension, HODBF (hierarchical off-diagonal butterfly) compression to compress large frontal matrices. The resulting multifrontal solver can attain quasi-linear computation and memory complexity when applied to sparse linear systems arising from spatial discretization of high-frequency wave equations. To further reduce the overall number of operations and especially the factorization memory usage to scale to larger problem sizes, in this article we develop amore »« less
https://doi.org/10.1145/3611662

Full Text Available
A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization

Journal Article Rouet, François-Henry ; Li, Xiaoye S. ; Ghysels, Pieter ; ... - ACM Transactions on Mathematical Software

In this paper, we present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, reliesmore »« less
Cited by 34
https://doi.org/10.1145/2930660

Full Text Available
Hierarchical off-diagonal low-rank approximation of Hessians in inverse problems, with application to ice sheet model initialization

Journal Article Hartland, Tucker ; Stadler, Georg ; Perego, Mauro ; ... - Inverse Problems

Obtaining lightweight and accurate approximations of discretized objective functional Hessians in inverse problems governed by partial differential equations (PDEs) is essential to make both deterministic and Bayesian statistical large-scale inverse problems computationally tractable. The cubic computational complexity of dense linear algebraic tasks, such as Cholesky factorization, that provide a means to sample Gaussian distributions and determine solutions of Newton linear systems is a computational bottleneck at large-scale. These tasks can be reduced to log-linear complexity by utilizing hierarchical off-diagonal low-rank (HODLR) matrix approximations. In this work, we show that a class of Hessians that arise from inverse problems governed bymore »« less
https://doi.org/10.1088/1361-6420/acd719
MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization

Journal Article Kannan, Ramakrishnan ; Ballard, Grey ; Park, Haesun - IEEE Transactions on Knowledge and Data Engineering

Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A≈WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a new, high-performance parallel computational framework for a broad class of NMF algorithms thatmore »« less
Cited by 17
https://doi.org/10.1109/TKDE.2017.2767592

Full Text Available

Similar Records

Title: Distributed-memory lattice H-matrix factorization

Abstract

Citation Formats

Figures / Tables:

Systematic Reduction of Data Movement in Algebraic Multigrid Solvers conference, May 2013

Application of Improved H-Matrices in Micromagnetic Simulations of Spin Torque Oscillator journal, March 2018

Improving Multifrontal Methods by Means of Block Low-Rank Representations journal, January 2015

Parallel Randomized and Matrix-Free Direct Solvers for Large Structured Dense Linear Systems journal, January 2016

Task-Parallel LU Factorization of Hierarchical Matrices Using OmpSs conference, May 2017

The adaptive cross-approximation technique for the 3D boundary-element method journal, March 2002

Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems journal, March 1992

Software Framework for Parallel BEM Analyses with H-matrices Using MPI and OpenMP journal, January 2017

A class of parallel tiled linear algebra algorithms for multicore architectures journal, January 2009

Parallel Hierarchical Matrices with Adaptive Cross Approximation on Symmetric Multiprocessing Clusters journal, January 2014

Adaptive Low-Rank Approximation of Collocation Matrices journal, February 2003

Geometry-oblivious FMM for compressing dense SPD matrices conference, January 2017

Efficient Scalable Algorithms for Solving Dense Linear Systems with Hierarchically Semiseparable Structures journal, January 2013

INV-ASKIT: A Parallel Fast Direct Solver for Kernel Matrices conference, May 2016

A Sparse Matrix Arithmetic Based on $\Cal H$ -Matrices. Part I: Introduction to ${\Cal H}$ -Matrices journal, April 1999

A Fast $ULV$ Decomposition Solver for Hierarchically Semiseparable Representations journal, January 2006

Lattice H-Matrices on Distributed-Memory Systems conference, May 2018

A Parallel Butterfly Algorithm journal, January 2014

On the Complexity of the Block Low-Rank Multifrontal Factorization journal, January 2017

A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization journal, June 2016

A Recursive Skeletonization Factorization Based on Strong Admissibility journal, January 2017

Fast Hierarchical Solvers For Sparse Matrices Using Extended Sparsification and Low-Rank Approximation journal, January 2017

Application of Hierarchical Matrices to Large-Scale Electromagnetic Field Analyses of Coils Wound With Coated Conductors journal, April 2018

Butterfly Factorization journal, January 2015

Porting the PLASMA Numerical Library to the OpenMP Standard journal, June 2016

A fast algorithm for particle simulations journal, December 1987

Parallel Hierarchical Matrices with Block Low-rank Representation on Distributed Memory Computer Systems conference, January 2018

Sweeping preconditioner for the Helmholtz equation: Hierarchical matrix representation journal, February 2011

Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients journal, December 2018

A Fast Algorithm for Particle Simulations journal, August 1997

Adaptive low-rank approximation of collocation matrices collection, January 2001

Sweeping Preconditioner for the Helmholtz Equation: Hierarchical Matrix Representation preprint, January 2010

Fast hierarchical solvers for sparse matrices using extended sparsification and low-rank approximation text, January 2015

A recursive skeletonization factorization based on strong admissibility text, January 2016

Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients text, January 2017

Systematic Reduction of Data Movement in Algebraic Multigrid Solvers
conference, May 2013

Application of Improved H-Matrices in Micromagnetic Simulations of Spin Torque Oscillator
journal, March 2018

Improving Multifrontal Methods by Means of Block Low-Rank Representations
journal, January 2015

Parallel Randomized and Matrix-Free Direct Solvers for Large Structured Dense Linear Systems
journal, January 2016

Task-Parallel LU Factorization of Hierarchical Matrices Using OmpSs
conference, May 2017

The adaptive cross-approximation technique for the 3D boundary-element method
journal, March 2002

Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems
journal, March 1992

Software Framework for Parallel BEM Analyses with H-matrices Using MPI and OpenMP
journal, January 2017

A class of parallel tiled linear algebra algorithms for multicore architectures
journal, January 2009

Parallel Hierarchical Matrices with Adaptive Cross Approximation on Symmetric Multiprocessing Clusters
journal, January 2014

Adaptive Low-Rank Approximation of Collocation Matrices
journal, February 2003

Geometry-oblivious FMM for compressing dense SPD matrices
conference, January 2017

Efficient Scalable Algorithms for Solving Dense Linear Systems with Hierarchically Semiseparable Structures
journal, January 2013

INV-ASKIT: A Parallel Fast Direct Solver for Kernel Matrices
conference, May 2016

A Sparse Matrix Arithmetic Based on $\Cal H$ -Matrices. Part I: Introduction to ${\Cal H}$ -Matrices
journal, April 1999

A Fast $ULV$ Decomposition Solver for Hierarchically Semiseparable Representations
journal, January 2006

Lattice H-Matrices on Distributed-Memory Systems
conference, May 2018

A Parallel Butterfly Algorithm
journal, January 2014

On the Complexity of the Block Low-Rank Multifrontal Factorization
journal, January 2017

A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization
journal, June 2016

A Recursive Skeletonization Factorization Based on Strong Admissibility
journal, January 2017

Fast Hierarchical Solvers For Sparse Matrices Using Extended Sparsification and Low-Rank Approximation
journal, January 2017

Application of Hierarchical Matrices to Large-Scale Electromagnetic Field Analyses of Coils Wound With Coated Conductors
journal, April 2018

Butterfly Factorization
journal, January 2015

Porting the PLASMA Numerical Library to the OpenMP Standard
journal, June 2016

A fast algorithm for particle simulations
journal, December 1987

Parallel Hierarchical Matrices with Block Low-rank Representation on Distributed Memory Computer Systems
conference, January 2018

Sweeping preconditioner for the Helmholtz equation: Hierarchical matrix representation
journal, February 2011

Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients
journal, December 2018

A Fast Algorithm for Particle Simulations
journal, August 1997

Adaptive low-rank approximation of collocation matrices
collection, January 2001

Sweeping Preconditioner for the Helmholtz Equation: Hierarchical Matrix Representation
preprint, January 2010

Fast hierarchical solvers for sparse matrices using extended sparsification and low-rank approximation
text, January 2015

A recursive skeletonization factorization based on strong admissibility
text, January 2016

Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients
text, January 2017