# A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization

## Abstract

In this paper, we present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, relies on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. Finally, this work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.

- Authors:

- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Univ. libre de Bruxelles (ULB), Brussels (Belgium)

- Publication Date:

- Research Org.:
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

- Sponsoring Org.:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)

- OSTI Identifier:
- 1393046

- Grant/Contract Number:
- AC02-05CH11231

- Resource Type:
- Journal Article: Accepted Manuscript

- Journal Name:
- ACM Transactions on Mathematical Software

- Additional Journal Information:
- Journal Volume: 42; Journal Issue: 4; Journal ID: ISSN 0098-3500

- Publisher:
- Association for Computing Machinery

- Country of Publication:
- United States

- Language:
- English

- Subject:
- 98 NUCLEAR DISARMAMENT, SAFEGUARDS, AND PHYSICAL PROTECTION; mathematical software; solvers; design; algorithms; performance; HSS matrices; randomized sampling; ULV factorization; parallel algorithms; distributed-memory

### Citation Formats

```
Rouet, François-Henry, Li, Xiaoye S., Ghysels, Pieter, and Napov, Artem.
```*A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization*. United States: N. p., 2016.
Web. doi:10.1145/2930660.

```
Rouet, François-Henry, Li, Xiaoye S., Ghysels, Pieter, & Napov, Artem.
```*A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization*. United States. doi:10.1145/2930660.

```
Rouet, François-Henry, Li, Xiaoye S., Ghysels, Pieter, and Napov, Artem. Thu .
"A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization". United States.
doi:10.1145/2930660. https://www.osti.gov/servlets/purl/1393046.
```

```
@article{osti_1393046,
```

title = {A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization},

author = {Rouet, François-Henry and Li, Xiaoye S. and Ghysels, Pieter and Napov, Artem},

abstractNote = {In this paper, we present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, relies on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. Finally, this work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.},

doi = {10.1145/2930660},

journal = {ACM Transactions on Mathematical Software},

number = 4,

volume = 42,

place = {United States},

year = {Thu Jun 30 00:00:00 EDT 2016},

month = {Thu Jun 30 00:00:00 EDT 2016}

}

*Citation information provided by*

Web of Science

Web of Science