Preparing sparse solvers for exascale computing

Anzt, Hartwig; Boman, Erik; Falgout, Rob; Ghysels, Pieter; Heroux, Michael; Li, Xiaoye; Curfman McInnes, Lois; Tran Mills, Richard; Rajamanickam, Sivasankaran; Rupp, Karl; Smith, Barry; Yamazaki, Ichitaro; Meier Yang, Ulrike

doi:10.1098/rsta.2019.0053

Preparing sparse solvers for exascale computing

Journal Article · Sun Jan 19 23:00:00 EST 2020 · Philosophical Transactions of the Royal Society. A, Mathematical, Physical and Engineering Sciences

DOI:https://doi.org/10.1098/rsta.2019.0053· OSTI ID:1601440

Anzt, Hartwig ^[1]; Boman, Erik ^[2]; Falgout, Rob ^[3]; Ghysels, Pieter ^[4]; ^[2]; Li, Xiaoye ^[4]; Curfman McInnes, Lois ^[5]; Tran Mills, Richard ^[5]; Rajamanickam, Sivasankaran ^[2]; Rupp, Karl ^[6]; Smith, Barry ^[5]; Yamazaki, Ichitaro ^[2]; Meier Yang, Ulrike ^[3]

Univ. of Tennessee, Knoxville, TN (United States)
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Argonne National Lab. (ANL), Argonne, IL (United States)
Vienna University of Technology, Wien, Wien, Austria

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges.

View Accepted Manuscript (DOE)

Research Organization:: Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)

Sponsoring Organization:: USDOE National Nuclear Security Administration (NNSA)

Grant/Contract Number:: AC04-94AL85000

OSTI ID:: 1601440

Alternate ID(s):: OSTI ID: 1604740
OSTI ID: 1607441
OSTI ID: 1770021

Report Number(s):: SAND--2019-10821J; 679361

Journal Information:: Philosophical Transactions of the Royal Society. A, Mathematical, Physical and Engineering Sciences, Journal Name: Philosophical Transactions of the Royal Society. A, Mathematical, Physical and Engineering Sciences Journal Issue: 2166 Vol. 378; ISSN 1364-503X

Publisher:: The Royal Society PublishingCopyright Statement

Country of Publication:: United States

Language:: English

References (46)

Stencil computations for PDE-based applications with examples from DUNE and hypre: Stencil Computations for PDE-based Applications Engwer, C.; Falgout, R. D.; Yang, U. M. Concurrency and Computation: Practice and Experience, Vol. 29, Issue 17 https://doi.org/10.1002/cpe.4097	journal	February 2017
Adaptive precision in block-Jacobi preconditioning for iterative sparse linear system solvers: Adaptive precision in block-Jacobi preconditioning for iterative solvers Anzt, Hartwig; Dongarra, Jack; Flegar, Goran Concurrency and Computation: Practice and Experience, Vol. 31, Issue 6 https://doi.org/10.1002/cpe.4460	journal	March 2018
A new parallel domain decomposition method for the adaptive finite element solution of elliptic partial differential equations Bank, Randolph E.; Jimack, Peter K. Concurrency and Computation: Practice and Experience, Vol. 13, Issue 5 https://doi.org/10.1002/cpe.569	journal	January 2001
Reducing communication in algebraic multigrid using additive variants: REDUCING COMMUNICATION IN AMG WITH ADDITIVE VARIANTS Vassilevski, Panayot S.; Yang, Ulrike Meier Numerical Linear Algebra with Applications, Vol. 21, Issue 2 https://doi.org/10.1002/nla.1928	journal	February 2014
A low-communication, parallel algorithm for solving PDEs based on range decomposition: RANGE DECOMPOSITION: A LOW COMMUNICATION ALGORITHM FOR SOLVING PDES Appelhans, David J.; Manteuffel, Tom; McCormick, Steve Numerical Linear Algebra with Applications, Vol. 24, Issue 3 https://doi.org/10.1002/nla.2041	journal	March 2016
Distance-two interpolation for parallel algebraic multigrid De Sterck, Hans; Falgout, Robert D.; Nolting, Joshua W. Numerical Linear Algebra with Applications, Vol. 15, Issue 2-3 https://doi.org/10.1002/nla.559	journal	January 2008
Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs Chow, Edmond; Anzt, Hartwig; Dongarra, Jack Lecture Notes in Computer Science, p. 1-16 https://doi.org/10.1007/978-3-319-20119-1_1	book	January 2015
A fast adaptive solver for hierarchically semiseparable representations Chandrasekaran, S.; Gu, M.; Lyons, W. Calcolo, Vol. 42, Issue 3-4 https://doi.org/10.1007/s10092-005-0103-3	journal	December 2005
An $$\mathcal O (N \log N)$$ O ( N log N ) Fast Direct Solver for Partial Hierarchically Semi-Separable Matrices: With Application to Radial Basis Function Interpolation Ambikasaran, Sivaram; Darve, Eric Journal of Scientific Computing, Vol. 57, Issue 3 https://doi.org/10.1007/s10915-013-9714-z	journal	April 2013
Updating incomplete factorization preconditioners for model order reduction Anzt, Hartwig; Chow, Edmond; Saak, Jens Numerical Algorithms, Vol. 73, Issue 3 https://doi.org/10.1007/s11075-016-0110-2	journal	February 2016
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel Journal of Parallel and Distributed Computing, Vol. 74, Issue 12 https://doi.org/10.1016/j.jpdc.2014.07.003	journal	December 2014
A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems Sao, Piyush; Li, Xiaoye S.; Vuduc, Richard Journal of Parallel and Distributed Computing, Vol. 131 https://doi.org/10.1016/j.jpdc.2019.03.004	journal	September 2019
Basker: Parallel sparse LU factorization utilizing hierarchical parallelism and data layouts Booth, Joshua D.; Ellingwood, Nathan D.; Thornquist, Heidi K. Parallel Computing, Vol. 68 https://doi.org/10.1016/j.parco.2017.06.003	journal	October 2017
Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures Deveci, Mehmet; Trott, Christian; Rajamanickam, Sivasankaran Parallel Computing, Vol. 78 https://doi.org/10.1016/j.parco.2018.06.009	journal	October 2018
Distance-two interpolation for parallel algebraic multigrid Sterck, H. De; Falgout, R. D.; Nolting, J. W. Journal of Physics: Conference Series, Vol. 78 https://doi.org/10.1088/1742-6596/78/1/012017	journal	July 2007
Fast linear algebra-based triangle counting with KokkosKernels Wolf, Michael M.; Deveci, Mehmet; Berry, Jonathan W. 2017 IEEE High-Performance Extreme Computing Conference (HPEC), 2017 IEEE High Performance Extreme Computing Conference (HPEC) https://doi.org/10.1109/HPEC.2017.8091043	conference	September 2017
Fast Triangle Counting Using Cilk Yasar, Abdurrahman; Rajamanickam, Sivasankaran; Wolf, Michael 2018 IEEE High Performance Extreme Computing Conference (HPEC) https://doi.org/10.1109/HPEC.2018.8547563	conference	September 2018
ShyLU: A Hybrid-Hybrid Solver for Multicore Platforms Rajamanickam, Sivasankaran; Boman, Erik G.; Heroux, Michael A. 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.64	conference	May 2012
Parallel Graph Coloring for Manycore Architectures Deveci, Mehmet; Boman, Erik G.; Devine, Karen D. 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2016.54	conference	May 2016
A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices Sao, Piyush; Li, Xiaoye Sherry; Vuduc, Richard 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2018.00100	conference	May 2018
ParILUT - A Parallel Threshold ILU for GPUs Anzt, Hartwig; Ribizel, Tobias; Flegar, Goran 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2019.00033	conference	May 2019
Tacho: Memory-Scalable Task Parallel Sparse Cholesky Factorization Kim, Kyungjoo; Edwards, H. Carter; Rajamanickam, Sivasankaran 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) https://doi.org/10.1109/IPDPSW.2018.00094	conference	May 2018
A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression Rebrova, Elizaveta; Chavez, Gustavo; Liu, Yang 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) https://doi.org/10.1109/IPDPSW.2018.00140	conference	May 2018
An HSS Matrix-Inspired Butterfly-Based Direct Solver for Analyzing Scattering From Two-Dimensional Objects Liu, Yang; Guo, Han; Michielssen, Eric IEEE Antennas and Wireless Propagation Letters, Vol. 16 https://doi.org/10.1109/LAWP.2016.2626786	journal	January 2017
Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster Yamazaki, Ichitaro; Rajamanickam, Sivasankaran; Boman, Erik G. SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.81	conference	November 2014
Iterative Methods for Sparse Linear Systems Saad, Yousef https://doi.org/10.1137/1.9780898718003	book	January 2003
Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods Bell, Nathan; Dalton, Steven; Olson, Luke N. SIAM Journal on Scientific Computing, Vol. 34, Issue 4 https://doi.org/10.1137/110838844	journal	January 2012
Improving Multifrontal Methods by Means of Block Low-Rank Representations Amestoy, Patrick; Ashcraft, Cleve; Boiteau, Olivier SIAM Journal on Scientific Computing, Vol. 37, Issue 3 https://doi.org/10.1137/120903476	journal	January 2015
Communication Avoiding ILU0 Preconditioner Grigori, Laura; Moufawad, Sophie SIAM Journal on Scientific Computing, Vol. 37, Issue 2 https://doi.org/10.1137/130930376	journal	January 2015
Non-Galerkin Coarse Grids for Algebraic Multigrid Falgout, Robert D.; Schroder, Jacob B. SIAM Journal on Scientific Computing, Vol. 36, Issue 3 https://doi.org/10.1137/130931539	journal	January 2014
Fine-Grained Parallel Incomplete LU Factorization Chow, Edmond; Patel, Aftab SIAM Journal on Scientific Computing, Vol. 37, Issue 2 https://doi.org/10.1137/140968896	journal	January 2015
Algebraic Multigrid Domain and Range Decomposition (AMG-DD/AMG-RD) Bank, R.; Falgout, R.; Jones, T. SIAM Journal on Scientific Computing, Vol. 37, Issue 5 https://doi.org/10.1137/140974717	journal	January 2015
Reducing Parallel Communication in Algebraic Multigrid through Sparsification Bienz, Amanda; Falgout, Robert D.; Gropp, William SIAM Journal on Scientific Computing, Vol. 38, Issue 5 https://doi.org/10.1137/15M1026341	journal	January 2016
ViennaCL---Linear Algebra Library for Multi- and Many-Core Architectures Rupp, Karl; Tillet, Philippe; Rudolf, Florian SIAM Journal on Scientific Computing, Vol. 38, Issue 5 https://doi.org/10.1137/15M1026419	journal	January 2016
ParILUT---A New Parallel Threshold ILU Factorization Anzt, Hartwig; Chow, Edmond; Dongarra, Jack SIAM Journal on Scientific Computing, Vol. 40, Issue 4 https://doi.org/10.1137/16M1079506	journal	January 2018
Robust and Accurate Stopping Criteria for Adaptive Randomized Sampling in Matrix-Free Hierarchically Semiseparable Construction Gorman, Christopher; Chávez, Gustavo; Ghysels, Pieter SIAM Journal on Scientific Computing, Vol. 41, Issue 5 https://doi.org/10.1137/18M1194961	journal	January 2019
A New Paradigm for Parallel Adaptive Meshing Algorithms Bank, Randolph E.; Holst, Michael SIAM Journal on Scientific Computing, Vol. 22, Issue 4 https://doi.org/10.1137/S1064827599353701	journal	January 2000
A New Paradigm for Parallel Adaptive Meshing Algorithms Bank, Randolph E.; Holst, Michael SIAM Review, Vol. 45, Issue 2 https://doi.org/10.1137/s003614450342061	journal	January 2003
Towards Extreme-Scale Simulations for Low Mach Fluids with Second-Generation Trilinos Lin, Paul; Bettencourt, Matthew; Domino, Stefan Parallel Processing Letters, Vol. 24, Issue 04 https://doi.org/10.1142/S0129626414420055	journal	December 2014
An overview of the Trilinos project Heroux, Michael A.; Phipps, Eric T.; Salinger, Andrew G. ACM Transactions on Mathematical Software, Vol. 31, Issue 3 https://doi.org/10.1145/1089014.1089021	journal	September 2005
A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter ACM Transactions on Mathematical Software, Vol. 42, Issue 4 https://doi.org/10.1145/2930660	journal	June 2016
Designing vector-friendly compact BLAS and LAPACK kernels Kim, Kyungjoo; Costa, Timothy B.; Deveci, Mehmet Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17 https://doi.org/10.1145/3126908.3126941	conference	January 2017
A communication-avoiding 3D sparse triangular solver Sao, Piyush; Kannan, Ramakrishnan; Li, Xiaoye Sherry Proceedings of the ACM International Conference on Supercomputing - ICS '19 https://doi.org/10.1145/3330345.3330357	conference	January 2019
A Parallel Multigrid Preconditioned Conjugate Gradient Algorithm for Groundwater Flow Simulations Ashby, Steven F.; Falgout, Robert D. Nuclear Science and Engineering, Vol. 124, Issue 1 https://doi.org/10.13182/NSE96-A24230	journal	September 1996
Ifpack2 User's Guide 1.0 Prokopenko, Andrey; Siefert, Christopher; Hu, Jonathan J. https://doi.org/10.2172/1259544	report	May 2016
ParILUT - A parallel threshold ILU for GPUS Anzt, Hartwig; Ribizel, Tobias; Flegar, Goran Institute of Electrical and Electronics Engineers (IEEE) https://doi.org/10.5445/ir/1000100175	text	January 2019

Cited By (1)

Toward Performance-Portable PETSc for GPU-based Exascale Systems Mills, Richard Tran; Adams, Mark F.; Balay, Satish arXiv https://doi.org/10.48550/arxiv.2011.00715	preprint	January 2020

Similar Records

Related Subjects

97 MATHEMATICS AND COMPUTING
mathematical libraries
sparse solvers

Preparing sparse solvers for exascale computing

Citation Formats

References (46)

Cited By (1)

Similar Records

Related Subjects