A DistributedMemory Package for Dense Hierarchically SemiSeparable Matrix Computations Using Randomization
Abstract
In this paper, we present a distributedmemory library for computations with dense structured matrices. A matrix is considered structured if its offdiagonal blocks can be approximated by a rankdeficient matrix with low numerical rank. Here, we use Hierarchically SemiSeparable (HSS) representations. Such matrices appear in many applications, for example, finiteelement methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrixvector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, relies on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrixvector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. Finally, this work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributedmemory sparse solver.
 Authors:
 Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
 Univ. libre de Bruxelles (ULB), Brussels (Belgium)
 Publication Date:
 Research Org.:
 Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
 Sponsoring Org.:
 USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC21)
 OSTI Identifier:
 1393046
 Grant/Contract Number:
 AC0205CH11231
 Resource Type:
 Journal Article: Accepted Manuscript
 Journal Name:
 ACM Transactions on Mathematical Software
 Additional Journal Information:
 Journal Volume: 42; Journal Issue: 4; Journal ID: ISSN 00983500
 Publisher:
 Association for Computing Machinery
 Country of Publication:
 United States
 Language:
 English
 Subject:
 98 NUCLEAR DISARMAMENT, SAFEGUARDS, AND PHYSICAL PROTECTION; mathematical software; solvers; design; algorithms; performance; HSS matrices; randomized sampling; ULV factorization; parallel algorithms; distributedmemory
Citation Formats
Rouet, FrançoisHenry, Li, Xiaoye S., Ghysels, Pieter, and Napov, Artem. A DistributedMemory Package for Dense Hierarchically SemiSeparable Matrix Computations Using Randomization. United States: N. p., 2016.
Web. doi:10.1145/2930660.
Rouet, FrançoisHenry, Li, Xiaoye S., Ghysels, Pieter, & Napov, Artem. A DistributedMemory Package for Dense Hierarchically SemiSeparable Matrix Computations Using Randomization. United States. doi:10.1145/2930660.
Rouet, FrançoisHenry, Li, Xiaoye S., Ghysels, Pieter, and Napov, Artem. 2016.
"A DistributedMemory Package for Dense Hierarchically SemiSeparable Matrix Computations Using Randomization". United States.
doi:10.1145/2930660. https://www.osti.gov/servlets/purl/1393046.
@article{osti_1393046,
title = {A DistributedMemory Package for Dense Hierarchically SemiSeparable Matrix Computations Using Randomization},
author = {Rouet, FrançoisHenry and Li, Xiaoye S. and Ghysels, Pieter and Napov, Artem},
abstractNote = {In this paper, we present a distributedmemory library for computations with dense structured matrices. A matrix is considered structured if its offdiagonal blocks can be approximated by a rankdeficient matrix with low numerical rank. Here, we use Hierarchically SemiSeparable (HSS) representations. Such matrices appear in many applications, for example, finiteelement methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrixvector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, relies on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrixvector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. Finally, this work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributedmemory sparse solver.},
doi = {10.1145/2930660},
journal = {ACM Transactions on Mathematical Software},
number = 4,
volume = 42,
place = {United States},
year = 2016,
month = 6
}

Parallelizing dense matrix computations to distributed memory architectures is a wellstudied subject and generally considered to be among the best understood domains of parallel computing. Two packages, developed in the mid 1990s, still enjoy regular use: ScaLAPACK and PLAPACK. With the advent of manycore architectures, which may very well take the shape of distributed memory architectures within a single processor, these packages must be revisited since the traditional MPIbased approaches will likely need to be extended. Thus, this is a good time to review lessons learned since the introduction of these two packages and to propose a simple yet effectivemore »

Distributed shared memory multiprocessor architecture MEMSY for high performance parallel computations
The rapid progress of microprocessors provides economic solutions for small and mediumscale data processing tasks, e.g., workstations. It is a challenging task to combine many powerful microprocessors to a fixed or reconfigurable array which is able to process very large processing tasks with supercomputer performance. Fortunately, many very large applications are regularly structured and can easily be partitioned. One example are physical phenomena which are often described by mathematical models, e.g. by sets of partial differential equations (PDE). In most cases, the mathematical models can only be computed approximately The finer the used model is, the higher is the necessarymore » 
Parallel sparse matrix computations in iterative solvers on distributed memory machines
Sparse matrix computations play an important role in iterative methods to solve systems of equations or eigenvalue problems that are applied during the solution of discretized partial differential equations. The large size of many technical or physical applications in this area results in the need for parallel execution of sparse operations, in particular sparse matrixvector multiplication, on distributed memory computers. In this report, a data distribution and a communication scheme are presented for parallel sparse iterative solvers. Performance tests, using the conjugate gradient method, the QMR and the TFQMR algorithm for solving systems of equations, and the Lanczos method formore » 
A Magnetically Separable, Highly Stable Enzyme System Based on Nanocomposites of Enzymes and Magnetic Nanoparticles Shipped in Hierarchically Ordered, Mesocellular, Mesoporous Silica
Enzymes are versatile nanoscale biocatalysts, and find increasing applications in many areas, including organic synthesis[13] and bioremediation.[45] However, the application of enzymes is often hampered by the short catalytic lifetime of enzymes and by the difficulty in recovery and recycling. To solve these problems, there have been a lot of efforts to develop effective enzyme immobilization techniques. Recent advances in nanotechnology provide more diverse materials and approaches for enzyme immobilization. For example, mesoporous materials offer potential advantages as a host of enzymes due to their wellcontrolled porosity and large surface area for the immobilization of enzymes.[6,7] On the other hand,more » 
A matrixalgebraic formulation of distributedmemory maximal cardinality matching algorithms in bipartite graphs
We describe parallel algorithms for computing maximal cardinality matching in a bipartite graph on distributedmemory systems. Unlike traditional algorithms that match one vertex at a time, our algorithms process many unmatched vertices simultaneously using a matrixalgebraic formulation of maximal matching. This generic matrixalgebraic framework is used to develop three efficient maximal matching algorithms with minimal changes. The newly developed algorithms have two benefits over existing graphbased algorithms. First, unlike existing parallel algorithms, cardinality of matching obtained by the new algorithms stays constant with increasing processor counts, which is important for predictable and reproducible performance. Second, relying on bulksynchronous matrix operations,more »