Basker: Parallel sparse LU factorization utilizing hierarchical parallelism and data layouts
Abstract
Transient simulation in circuit simulation tools, such as SPICE and Xyce, depend on scalable and robust sparse LU factorizations for efficient numerical simulation of circuits and power grids. As the need for simulations of very large circuits grow, the prevalence of multicore architectures enable us to use shared memory parallel algorithms for such simulations. A parallel factorization is a critical component of such shared memory parallel simulations. We develop a parallel sparse factorization algorithm that can solve problems from circuit simulations efficiently, and map well to architectural features. This new factorization algorithm exposes hierarchical parallelism to accommodate irregular structure that arise in our target problems. It also uses a hierarchical two-dimensional data layout which reduces synchronization costs and maps to memory hierarchy found in multicore processors. We present an OpenMP based implementation of the parallel algorithm in a new multithreaded solver called Basker in the Trilinos framework. Here, we present performance evaluations of Basker on the Intel SandyBridge and Xeon Phi platforms using circuit and power grid matrices taken from the University of Florida sparse matrix collection and from Xyce circuit simulation. Basker achieves a geometric mean speedup of 5.91× on CPU (16 cores) and 7.4× on Xeon Phi (32more »
- Authors:
-
- Bucknell Univ., Lewisburg, PA (United States)
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Publication Date:
- Research Org.:
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA)
- OSTI Identifier:
- 1499033
- Alternate Identifier(s):
- OSTI ID: 1550153
- Report Number(s):
- SAND-2019-2046J
Journal ID: ISSN 0167-8191; 672871
- Grant/Contract Number:
- AC04-94AL85000; NA-0003525
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Parallel Computing
- Additional Journal Information:
- Journal Volume: 68; Journal Issue: C; Journal ID: ISSN 0167-8191
- Publisher:
- Elsevier
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Parallel LU factorization; Multithreaded solvers; Circuit simulation; Solvers on Intel Xeon Phi
Citation Formats
Booth, Joshua Dennis, Ellingwood, Nathan David, Thornquist, Heidi K., and Rajamanickam, Sivasankaran. Basker: Parallel sparse LU factorization utilizing hierarchical parallelism and data layouts. United States: N. p., 2017.
Web. doi:10.1016/j.parco.2017.06.003.
Booth, Joshua Dennis, Ellingwood, Nathan David, Thornquist, Heidi K., & Rajamanickam, Sivasankaran. Basker: Parallel sparse LU factorization utilizing hierarchical parallelism and data layouts. United States. https://doi.org/10.1016/j.parco.2017.06.003
Booth, Joshua Dennis, Ellingwood, Nathan David, Thornquist, Heidi K., and Rajamanickam, Sivasankaran. Sat .
"Basker: Parallel sparse LU factorization utilizing hierarchical parallelism and data layouts". United States. https://doi.org/10.1016/j.parco.2017.06.003. https://www.osti.gov/servlets/purl/1499033.
@article{osti_1499033,
title = {Basker: Parallel sparse LU factorization utilizing hierarchical parallelism and data layouts},
author = {Booth, Joshua Dennis and Ellingwood, Nathan David and Thornquist, Heidi K. and Rajamanickam, Sivasankaran},
abstractNote = {Transient simulation in circuit simulation tools, such as SPICE and Xyce, depend on scalable and robust sparse LU factorizations for efficient numerical simulation of circuits and power grids. As the need for simulations of very large circuits grow, the prevalence of multicore architectures enable us to use shared memory parallel algorithms for such simulations. A parallel factorization is a critical component of such shared memory parallel simulations. We develop a parallel sparse factorization algorithm that can solve problems from circuit simulations efficiently, and map well to architectural features. This new factorization algorithm exposes hierarchical parallelism to accommodate irregular structure that arise in our target problems. It also uses a hierarchical two-dimensional data layout which reduces synchronization costs and maps to memory hierarchy found in multicore processors. We present an OpenMP based implementation of the parallel algorithm in a new multithreaded solver called Basker in the Trilinos framework. Here, we present performance evaluations of Basker on the Intel SandyBridge and Xeon Phi platforms using circuit and power grid matrices taken from the University of Florida sparse matrix collection and from Xyce circuit simulation. Basker achieves a geometric mean speedup of 5.91× on CPU (16 cores) and 7.4× on Xeon Phi (32 cores) relative to state-of-the-art solver KLU. Basker outperforms Intel MKL Pardiso solver (PMKL) by as much as 30× on CPU (16 cores) and 7.5× on Xeon Phi (32 cores) for low fill-in circuit matrices. Furthermore, Basker provides 5.4× speedup on a challenging matrix sequence taken from an actual Xyce simulation.},
doi = {10.1016/j.parco.2017.06.003},
journal = {Parallel Computing},
number = C,
volume = 68,
place = {United States},
year = {Sat Jun 03 00:00:00 EDT 2017},
month = {Sat Jun 03 00:00:00 EDT 2017}
}
Web of Science
Works referenced in this record:
A survey of direct methods for sparse linear systems
journal, May 2016
- Davis, Timothy A.; Rajamanickam, Sivasankaran; Sid-Lakhdar, Wissam M.
- Acta Numerica, Vol. 25
SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems
journal, June 2003
- Li, Xiaoye S.; Demmel, James W.
- ACM Transactions on Mathematical Software, Vol. 29, Issue 2
PARDISO: a high-performance serial and parallel sparse linear solver in semiconductor device simulation
journal, September 2001
- Schenk, Olaf; Gärtner, Klaus; Fichtner, Wolfgang
- Future Generation Computer Systems, Vol. 18, Issue 1
PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems
journal, February 2002
- Hénon, P.; Ramet, P.; Roman, J.
- Parallel Computing, Vol. 28, Issue 2
A Supernodal Approach to Sparse Partial Pivoting
journal, January 1999
- Demmel, James W.; Eisenstat, Stanley C.; Gilbert, John R.
- SIAM Journal on Matrix Analysis and Applications, Vol. 20, Issue 3
An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination
journal, January 1999
- Demmel, James W.; Gilbert, John R.; Li, Xiaoye S.
- SIAM Journal on Matrix Analysis and Applications, Vol. 20, Issue 4
Algorithm 907: KLU, A Direct Sparse Solver for Circuit Simulation Problems
journal, September 2010
- Davis, Timothy A.; Palamadai Natarajan, Ekanathan
- ACM Transactions on Mathematical Software, Vol. 37, Issue 3
Sparse Partial Pivoting in Time Proportional to Arithmetic Operations
journal, September 1988
- Gilbert, John R.; Peierls, Tim
- SIAM Journal on Scientific and Statistical Computing, Vol. 9, Issue 5
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
journal, December 2014
- Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel
- Journal of Parallel and Distributed Computing, Vol. 74, Issue 12
An Approximate Minimum Degree Ordering Algorithm
journal, October 1996
- Amestoy, Patrick R.; Davis, Timothy A.; Duff, Iain S.
- SIAM Journal on Matrix Analysis and Applications, Vol. 17, Issue 4
On Algorithms For Permuting Large Entries to the Diagonal of a Sparse Matrix
journal, January 2001
- Duff, I. S.; Koster, J.
- SIAM Journal on Matrix Analysis and Applications, Vol. 22, Issue 4
Computing the block triangular form of a sparse matrix
journal, December 1990
- Pothen, Alex; Fan, Chin-Ju
- ACM Transactions on Mathematical Software (TOMS), Vol. 16, Issue 4
Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout
report, January 2016
- Kim, Kyungjoo; Rajamanickam, Sivasankaran; Stelle, George Widgery
The Role of Elimination Trees in Sparse Factorization
journal, January 1990
- Liu, Joseph W. H.
- SIAM Journal on Matrix Analysis and Applications, Vol. 11, Issue 1
The Theory of Elimination Trees for Sparse Unsymmetric Matrices
journal, January 2005
- Eisenstat, Stanley C.; Liu, Joseph W. H.
- SIAM Journal on Matrix Analysis and Applications, Vol. 26, Issue 3
Algorithmic Aspects of Vertex Elimination on Directed Graphs
journal, January 1978
- Rose, Donald J.; Tarjan, Robert Endre
- SIAM Journal on Applied Mathematics, Vol. 34, Issue 1
Algorithmic Aspects of Vertex Elimination on Graphs
journal, June 1976
- Rose, Donald J.; Tarjan, R. Endre; Lueker, George S.
- SIAM Journal on Computing, Vol. 5, Issue 2
The university of Florida sparse matrix collection
journal, November 2011
- Davis, Timothy A.; Hu, Yifan
- ACM Transactions on Mathematical Software, Vol. 38, Issue 1
Works referencing / citing this record:
Preparing sparse solvers for exascale computing
journal, January 2020
- Anzt, Hartwig; Boman, Erik; Falgout, Rob
- Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166