DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Basker: Parallel sparse LU factorization utilizing hierarchical parallelism and data layouts

Abstract

Transient simulation in circuit simulation tools, such as SPICE and Xyce, depend on scalable and robust sparse LU factorizations for efficient numerical simulation of circuits and power grids. As the need for simulations of very large circuits grow, the prevalence of multicore architectures enable us to use shared memory parallel algorithms for such simulations. A parallel factorization is a critical component of such shared memory parallel simulations. We develop a parallel sparse factorization algorithm that can solve problems from circuit simulations efficiently, and map well to architectural features. This new factorization algorithm exposes hierarchical parallelism to accommodate irregular structure that arise in our target problems. It also uses a hierarchical two-dimensional data layout which reduces synchronization costs and maps to memory hierarchy found in multicore processors. We present an OpenMP based implementation of the parallel algorithm in a new multithreaded solver called Basker in the Trilinos framework. Here, we present performance evaluations of Basker on the Intel SandyBridge and Xeon Phi platforms using circuit and power grid matrices taken from the University of Florida sparse matrix collection and from Xyce circuit simulation. Basker achieves a geometric mean speedup of 5.91× on CPU (16 cores) and 7.4× on Xeon Phi (32more » cores) relative to state-of-the-art solver KLU. Basker outperforms Intel MKL Pardiso solver (PMKL) by as much as 30× on CPU (16 cores) and 7.5× on Xeon Phi (32 cores) for low fill-in circuit matrices. Furthermore, Basker provides 5.4× speedup on a challenging matrix sequence taken from an actual Xyce simulation.« less

Authors:
 [1];  [2];  [2];  [2]
  1. Bucknell Univ., Lewisburg, PA (United States)
  2. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1499033
Alternate Identifier(s):
OSTI ID: 1550153
Report Number(s):
SAND-2019-2046J
Journal ID: ISSN 0167-8191; 672871
Grant/Contract Number:  
AC04-94AL85000; NA-0003525
Resource Type:
Accepted Manuscript
Journal Name:
Parallel Computing
Additional Journal Information:
Journal Volume: 68; Journal Issue: C; Journal ID: ISSN 0167-8191
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Parallel LU factorization; Multithreaded solvers; Circuit simulation; Solvers on Intel Xeon Phi

Citation Formats

Booth, Joshua Dennis, Ellingwood, Nathan David, Thornquist, Heidi K., and Rajamanickam, Sivasankaran. Basker: Parallel sparse LU factorization utilizing hierarchical parallelism and data layouts. United States: N. p., 2017. Web. doi:10.1016/j.parco.2017.06.003.
Booth, Joshua Dennis, Ellingwood, Nathan David, Thornquist, Heidi K., & Rajamanickam, Sivasankaran. Basker: Parallel sparse LU factorization utilizing hierarchical parallelism and data layouts. United States. https://doi.org/10.1016/j.parco.2017.06.003
Booth, Joshua Dennis, Ellingwood, Nathan David, Thornquist, Heidi K., and Rajamanickam, Sivasankaran. Sat . "Basker: Parallel sparse LU factorization utilizing hierarchical parallelism and data layouts". United States. https://doi.org/10.1016/j.parco.2017.06.003. https://www.osti.gov/servlets/purl/1499033.
@article{osti_1499033,
title = {Basker: Parallel sparse LU factorization utilizing hierarchical parallelism and data layouts},
author = {Booth, Joshua Dennis and Ellingwood, Nathan David and Thornquist, Heidi K. and Rajamanickam, Sivasankaran},
abstractNote = {Transient simulation in circuit simulation tools, such as SPICE and Xyce, depend on scalable and robust sparse LU factorizations for efficient numerical simulation of circuits and power grids. As the need for simulations of very large circuits grow, the prevalence of multicore architectures enable us to use shared memory parallel algorithms for such simulations. A parallel factorization is a critical component of such shared memory parallel simulations. We develop a parallel sparse factorization algorithm that can solve problems from circuit simulations efficiently, and map well to architectural features. This new factorization algorithm exposes hierarchical parallelism to accommodate irregular structure that arise in our target problems. It also uses a hierarchical two-dimensional data layout which reduces synchronization costs and maps to memory hierarchy found in multicore processors. We present an OpenMP based implementation of the parallel algorithm in a new multithreaded solver called Basker in the Trilinos framework. Here, we present performance evaluations of Basker on the Intel SandyBridge and Xeon Phi platforms using circuit and power grid matrices taken from the University of Florida sparse matrix collection and from Xyce circuit simulation. Basker achieves a geometric mean speedup of 5.91× on CPU (16 cores) and 7.4× on Xeon Phi (32 cores) relative to state-of-the-art solver KLU. Basker outperforms Intel MKL Pardiso solver (PMKL) by as much as 30× on CPU (16 cores) and 7.5× on Xeon Phi (32 cores) for low fill-in circuit matrices. Furthermore, Basker provides 5.4× speedup on a challenging matrix sequence taken from an actual Xyce simulation.},
doi = {10.1016/j.parco.2017.06.003},
journal = {Parallel Computing},
number = C,
volume = 68,
place = {United States},
year = {Sat Jun 03 00:00:00 EDT 2017},
month = {Sat Jun 03 00:00:00 EDT 2017}
}

Journal Article:

Citation Metrics:
Cited by: 9 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

A survey of direct methods for sparse linear systems
journal, May 2016

  • Davis, Timothy A.; Rajamanickam, Sivasankaran; Sid-Lakhdar, Wissam M.
  • Acta Numerica, Vol. 25
  • DOI: 10.1017/S0962492916000076

SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems
journal, June 2003

  • Li, Xiaoye S.; Demmel, James W.
  • ACM Transactions on Mathematical Software, Vol. 29, Issue 2
  • DOI: 10.1145/779359.779361

PARDISO: a high-performance serial and parallel sparse linear solver in semiconductor device simulation
journal, September 2001


PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems
journal, February 2002


A Supernodal Approach to Sparse Partial Pivoting
journal, January 1999

  • Demmel, James W.; Eisenstat, Stanley C.; Gilbert, John R.
  • SIAM Journal on Matrix Analysis and Applications, Vol. 20, Issue 3
  • DOI: 10.1137/S0895479895291765

An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination
journal, January 1999

  • Demmel, James W.; Gilbert, John R.; Li, Xiaoye S.
  • SIAM Journal on Matrix Analysis and Applications, Vol. 20, Issue 4
  • DOI: 10.1137/S0895479897317685

Algorithm 907: KLU, A Direct Sparse Solver for Circuit Simulation Problems
journal, September 2010

  • Davis, Timothy A.; Palamadai Natarajan, Ekanathan
  • ACM Transactions on Mathematical Software, Vol. 37, Issue 3
  • DOI: 10.1145/1824801.1824814

Sparse Partial Pivoting in Time Proportional to Arithmetic Operations
journal, September 1988

  • Gilbert, John R.; Peierls, Tim
  • SIAM Journal on Scientific and Statistical Computing, Vol. 9, Issue 5
  • DOI: 10.1137/0909058

Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
journal, December 2014

  • Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel
  • Journal of Parallel and Distributed Computing, Vol. 74, Issue 12
  • DOI: 10.1016/j.jpdc.2014.07.003

An Approximate Minimum Degree Ordering Algorithm
journal, October 1996

  • Amestoy, Patrick R.; Davis, Timothy A.; Duff, Iain S.
  • SIAM Journal on Matrix Analysis and Applications, Vol. 17, Issue 4
  • DOI: 10.1137/S0895479894278952

On Algorithms For Permuting Large Entries to the Diagonal of a Sparse Matrix
journal, January 2001


Computing the block triangular form of a sparse matrix
journal, December 1990

  • Pothen, Alex; Fan, Chin-Ju
  • ACM Transactions on Mathematical Software (TOMS), Vol. 16, Issue 4
  • DOI: 10.1145/98267.98287

Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout
report, January 2016

  • Kim, Kyungjoo; Rajamanickam, Sivasankaran; Stelle, George Widgery
  • DOI: 10.2172/1237520

The Role of Elimination Trees in Sparse Factorization
journal, January 1990

  • Liu, Joseph W. H.
  • SIAM Journal on Matrix Analysis and Applications, Vol. 11, Issue 1
  • DOI: 10.1137/0611010

The Theory of Elimination Trees for Sparse Unsymmetric Matrices
journal, January 2005

  • Eisenstat, Stanley C.; Liu, Joseph W. H.
  • SIAM Journal on Matrix Analysis and Applications, Vol. 26, Issue 3
  • DOI: 10.1137/S089547980240563X

Algorithmic Aspects of Vertex Elimination on Directed Graphs
journal, January 1978

  • Rose, Donald J.; Tarjan, Robert Endre
  • SIAM Journal on Applied Mathematics, Vol. 34, Issue 1
  • DOI: 10.1137/0134014

Algorithmic Aspects of Vertex Elimination on Graphs
journal, June 1976

  • Rose, Donald J.; Tarjan, R. Endre; Lueker, George S.
  • SIAM Journal on Computing, Vol. 5, Issue 2
  • DOI: 10.1137/0205021

The university of Florida sparse matrix collection
journal, November 2011

  • Davis, Timothy A.; Hu, Yifan
  • ACM Transactions on Mathematical Software, Vol. 38, Issue 1
  • DOI: 10.1145/2049662.2049663

Works referencing / citing this record:

Preparing sparse solvers for exascale computing
journal, January 2020

  • Anzt, Hartwig; Boman, Erik; Falgout, Rob
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166
  • DOI: 10.1098/rsta.2019.0053