Preparing sparse solvers for exascale computing

Anzt, Hartwig; Boman, Erik; Falgout, Rob; Ghysels, Pieter; Heroux, Michael; Li, Xiaoye; Curfman McInnes, Lois; Tran Mills, Richard; Rajamanickam, Sivasankaran; Rupp, Karl; Smith, Barry; Yamazaki, Ichitaro; Meier Yang, Ulrike

doi:10.1098/rsta.2019.0053

Title: Preparing sparse solvers for exascale computing

Journal Article · Mon Jan 20 00:00:00 EST 2020 · Philosophical Transactions of the Royal Society. A, Mathematical, Physical and Engineering Sciences

DOI:https://doi.org/10.1098/rsta.2019.0053· OSTI ID:1601440

Anzt, Hartwig ^[1]; Boman, Erik ^[2]; Falgout, Rob ^[3]; Ghysels, Pieter ^[4];

^[2]; Li, Xiaoye ^[4]; Curfman McInnes, Lois ^[5]; Tran Mills, Richard ^[5]; Rajamanickam, Sivasankaran ^[2]; Rupp, Karl ^[6]; Smith, Barry ^[5]; Yamazaki, Ichitaro ^[2]; Meier Yang, Ulrike ^[3]

Univ. of Tennessee, Knoxville, TN (United States)
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Argonne National Lab. (ANL), Argonne, IL (United States)
Vienna University of Technology, Wien, Wien, Austria

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States); Argonne National Laboratory (ANL), Argonne, IL (United States); Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)

Sponsoring Organization:: USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC)

Grant/Contract Number:: AC04-94AL85000; AC02-05CH11231; AC02-06CH11357; AC52-07NA27344

OSTI ID:: 1601440

Alternate ID(s):: OSTI ID: 1604740; OSTI ID: 1607441; OSTI ID: 1770021

Report Number(s):: SAND-2019-10821J; 1473756; LLNL-JRNL-786598; 679361

Journal Information:: Philosophical Transactions of the Royal Society. A, Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166; ISSN 1364-503X

Publisher:: The Royal Society PublishingCopyright Statement

Country of Publication:: United States

Language:: English

Citation Metrics:

Cited by: 9 works

Citation information provided by
Web of Science

References (44)

Updating incomplete factorization preconditioners for model order reduction Anzt, Hartwig; Chow, Edmond; Saak, Jens Numerical Algorithms, Vol. 73, Issue 3 https://doi.org/10.1007/s11075-016-0110-2	journal	February 2016
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel Journal of Parallel and Distributed Computing, Vol. 74, Issue 12 https://doi.org/10.1016/j.jpdc.2014.07.003	journal	December 2014
Fine-Grained Parallel Incomplete LU Factorization Chow, Edmond; Patel, Aftab SIAM Journal on Scientific Computing, Vol. 37, Issue 2 https://doi.org/10.1137/140968896	journal	January 2015
ParILUT---A New Parallel Threshold ILU Factorization Anzt, Hartwig; Chow, Edmond; Dongarra, Jack SIAM Journal on Scientific Computing, Vol. 40, Issue 4 https://doi.org/10.1137/16M1079506	journal	January 2018
Parallel Graph Coloring for Manycore Architectures Deveci, Mehmet; Boman, Erik G.; Devine, Karen D. 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2016.54	conference	May 2016
Iterative Methods for Sparse Linear Systems Saad, Yousef https://doi.org/10.1137/1.9780898718003	book	January 2003
A New Paradigm for Parallel Adaptive Meshing Algorithms Bank, Randolph E.; Holst, Michael SIAM Journal on Scientific Computing, Vol. 22, Issue 4 https://doi.org/10.1137/S1064827599353701	journal	January 2000
Towards Extreme-Scale Simulations for Low Mach Fluids with Second-Generation Trilinos Lin, Paul; Bettencourt, Matthew; Domino, Stefan Parallel Processing Letters, Vol. 24, Issue 04 https://doi.org/10.1142/S0129626414420055	journal	December 2014
Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures Deveci, Mehmet; Trott, Christian; Rajamanickam, Sivasankaran Parallel Computing, Vol. 78 https://doi.org/10.1016/j.parco.2018.06.009	journal	October 2018
ViennaCL---Linear Algebra Library for Multi- and Many-Core Architectures Rupp, Karl; Tillet, Philippe; Rudolf, Florian SIAM Journal on Scientific Computing, Vol. 38, Issue 5 https://doi.org/10.1137/15M1026419	journal	January 2016
An overview of the Trilinos project Heroux, Michael A.; Phipps, Eric T.; Salinger, Andrew G. ACM Transactions on Mathematical Software, Vol. 31, Issue 3 https://doi.org/10.1145/1089014.1089021	journal	September 2005
A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter ACM Transactions on Mathematical Software, Vol. 42, Issue 4 https://doi.org/10.1145/2930660	journal	June 2016
ParILUT - A Parallel Threshold ILU for GPUs Anzt, Hartwig; Ribizel, Tobias; Flegar, Goran 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2019.00033	conference	May 2019
Reducing communication in algebraic multigrid using additive variants: REDUCING COMMUNICATION IN AMG WITH ADDITIVE VARIANTS Vassilevski, Panayot S.; Yang, Ulrike Meier Numerical Linear Algebra with Applications, Vol. 21, Issue 2 https://doi.org/10.1002/nla.1928	journal	February 2014
Communication Avoiding ILU0 Preconditioner Grigori, Laura; Moufawad, Sophie SIAM Journal on Scientific Computing, Vol. 37, Issue 2 https://doi.org/10.1137/130930376	journal	January 2015
Adaptive precision in block-Jacobi preconditioning for iterative sparse linear system solvers: Adaptive precision in block-Jacobi preconditioning for iterative solvers Anzt, Hartwig; Dongarra, Jack; Flegar, Goran Concurrency and Computation: Practice and Experience, Vol. 31, Issue 6 https://doi.org/10.1002/cpe.4460	journal	March 2018
ShyLU: A Hybrid-Hybrid Solver for Multicore Platforms Rajamanickam, Sivasankaran; Boman, Erik G.; Heroux, Michael A. 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.64	conference	May 2012
Reducing Parallel Communication in Algebraic Multigrid through Sparsification Bienz, Amanda; Falgout, Robert D.; Gropp, William SIAM Journal on Scientific Computing, Vol. 38, Issue 5 https://doi.org/10.1137/15M1026341	journal	January 2016
Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods Bell, Nathan; Dalton, Steven; Olson, Luke N. SIAM Journal on Scientific Computing, Vol. 34, Issue 4 https://doi.org/10.1137/110838844	journal	January 2012
A fast adaptive solver for hierarchically semiseparable representations Chandrasekaran, S.; Gu, M.; Lyons, W. Calcolo, Vol. 42, Issue 3-4 https://doi.org/10.1007/s10092-005-0103-3	journal	December 2005
Improving Multifrontal Methods by Means of Block Low-Rank Representations Amestoy, Patrick; Ashcraft, Cleve; Boiteau, Olivier SIAM Journal on Scientific Computing, Vol. 37, Issue 3 https://doi.org/10.1137/120903476	journal	January 2015
A new parallel domain decomposition method for the adaptive finite element solution of elliptic partial differential equations Bank, Randolph E.; Jimack, Peter K. Concurrency and Computation: Practice and Experience, Vol. 13, Issue 5 https://doi.org/10.1002/cpe.569	journal	January 2001
An $$\mathcal O (N \log N)$$ O ( N log N ) Fast Direct Solver for Partial Hierarchically Semi-Separable Matrices: With Application to Radial Basis Function Interpolation Ambikasaran, Sivaram; Darve, Eric Journal of Scientific Computing, Vol. 57, Issue 3 https://doi.org/10.1007/s10915-013-9714-z	journal	April 2013
Non-Galerkin Coarse Grids for Algebraic Multigrid Falgout, Robert D.; Schroder, Jacob B. SIAM Journal on Scientific Computing, Vol. 36, Issue 3 https://doi.org/10.1137/130931539	journal	January 2014
A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression Rebrova, Elizaveta; Chavez, Gustavo; Liu, Yang 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) https://doi.org/10.1109/IPDPSW.2018.00140	conference	May 2018
Fast linear algebra-based triangle counting with KokkosKernels Wolf, Michael M.; Deveci, Mehmet; Berry, Jonathan W. 2017 IEEE High-Performance Extreme Computing Conference (HPEC), 2017 IEEE High Performance Extreme Computing Conference (HPEC) https://doi.org/10.1109/HPEC.2017.8091043	conference	September 2017
Designing vector-friendly compact BLAS and LAPACK kernels Kim, Kyungjoo; Costa, Timothy B.; Deveci, Mehmet Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17 https://doi.org/10.1145/3126908.3126941	conference	January 2017
Distance-two interpolation for parallel algebraic multigrid De Sterck, Hans; Falgout, Robert D.; Nolting, Joshua W. Numerical Linear Algebra with Applications, Vol. 15, Issue 2-3 https://doi.org/10.1002/nla.559	journal	January 2008
Stencil computations for PDE-based applications with examples from DUNE and hypre: Stencil Computations for PDE-based Applications Engwer, C.; Falgout, R. D.; Yang, U. M. Concurrency and Computation: Practice and Experience, Vol. 29, Issue 17 https://doi.org/10.1002/cpe.4097	journal	February 2017
Fast Triangle Counting Using Cilk Yasar, Abdurrahman; Rajamanickam, Sivasankaran; Wolf, Michael 2018 IEEE High Performance Extreme Computing Conference (HPEC) https://doi.org/10.1109/HPEC.2018.8547563	conference	September 2018
A low-communication, parallel algorithm for solving PDEs based on range decomposition: RANGE DECOMPOSITION: A LOW COMMUNICATION ALGORITHM FOR SOLVING PDES Appelhans, David J.; Manteuffel, Tom; McCormick, Steve Numerical Linear Algebra with Applications, Vol. 24, Issue 3 https://doi.org/10.1002/nla.2041	journal	March 2016
Basker: Parallel sparse LU factorization utilizing hierarchical parallelism and data layouts Booth, Joshua D.; Ellingwood, Nathan D.; Thornquist, Heidi K. Parallel Computing, Vol. 68 https://doi.org/10.1016/j.parco.2017.06.003	journal	October 2017
A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices Sao, Piyush; Li, Xiaoye Sherry; Vuduc, Richard 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2018.00100	conference	May 2018
A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems Sao, Piyush; Li, Xiaoye S.; Vuduc, Richard Journal of Parallel and Distributed Computing, Vol. 131 https://doi.org/10.1016/j.jpdc.2019.03.004	journal	September 2019
Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster Yamazaki, Ichitaro; Rajamanickam, Sivasankaran; Boman, Erik G. SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.81	conference	November 2014
A Parallel Multigrid Preconditioned Conjugate Gradient Algorithm for Groundwater Flow Simulations Ashby, Steven F.; Falgout, Robert D. Nuclear Science and Engineering, Vol. 124, Issue 1 https://doi.org/10.13182/NSE96-A24230	journal	September 1996
Robust and Accurate Stopping Criteria for Adaptive Randomized Sampling in Matrix-Free Hierarchically Semiseparable Construction Gorman, Christopher; Chávez, Gustavo; Ghysels, Pieter SIAM Journal on Scientific Computing, Vol. 41, Issue 5 https://doi.org/10.1137/18M1194961	journal	January 2019
Tacho: Memory-Scalable Task Parallel Sparse Cholesky Factorization Kim, Kyungjoo; Edwards, H. Carter; Rajamanickam, Sivasankaran 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) https://doi.org/10.1109/IPDPSW.2018.00094	conference	May 2018
Algebraic Multigrid Domain and Range Decomposition (AMG-DD/AMG-RD) Bank, R.; Falgout, R.; Jones, T. SIAM Journal on Scientific Computing, Vol. 37, Issue 5 https://doi.org/10.1137/140974717	journal	January 2015
An HSS Matrix-Inspired Butterfly-Based Direct Solver for Analyzing Scattering From Two-Dimensional Objects Liu, Yang; Guo, Han; Michielssen, Eric IEEE Antennas and Wireless Propagation Letters, Vol. 16 https://doi.org/10.1109/LAWP.2016.2626786	journal	January 2017
A communication-avoiding 3D sparse triangular solver Sao, Piyush; Kannan, Ramakrishnan; Li, Xiaoye Sherry Proceedings of the ACM International Conference on Supercomputing - ICS '19 https://doi.org/10.1145/3330345.3330357	conference	January 2019
Distance-two interpolation for parallel algebraic multigrid Sterck, H. De; Falgout, R. D.; Nolting, J. W. Journal of Physics: Conference Series, Vol. 78 https://doi.org/10.1088/1742-6596/78/1/012017	journal	July 2007
A New Paradigm for Parallel Adaptive Meshing Algorithms Bank, Randolph E.; Holst, Michael SIAM Review, Vol. 45, Issue 2 https://doi.org/10.1137/s003614450342061	journal	January 2003
ParILUT - A parallel threshold ILU for GPUS Anzt, Hartwig; Ribizel, Tobias; Flegar, Goran Institute of Electrical and Electronics Engineers (IEEE) https://doi.org/10.5445/ir/1000100175	text	January 2019

Cited By (1)

Toward Performance-Portable PETSc for GPU-based Exascale Systems Mills, Richard Tran; Adams, Mark F.; Balay, Satish arXiv https://doi.org/10.48550/arxiv.2011.00715	preprint	January 2020

Similar Records

PREPARING FOR EXASCALE: ORNL Leadership Computing Application Requirements and Strategy

Technical Report · Tue Dec 01 00:00:00 EST 2009 · OSTI ID:1601440

Joubert, Wayne; Kothe, Douglas B; Nam, Hai Ah

Scalable domain decomposition solvers for stochastic PDEs in high performance computing

Journal Article · Thu Sep 21 00:00:00 EDT 2017 · Computer Methods in Applied Mechanics and Engineering · OSTI ID:1601440

Desai, Ajit; Khalil, Mohammad; Pettit, Chris; +2 more

Scientific Application Requirements for Leadership Computing at the Exascale

Technical Report · Sat Dec 01 00:00:00 EST 2007 · OSTI ID:1601440

Ahern, Sean; Alam, Sadaf R; Fahey, Mark R; +8 more

Related Subjects

97 MATHEMATICS AND COMPUTING
sparse solvers
mathematical libraries

Title: Preparing sparse solvers for exascale computing

Citation Formats

References (44)

Cited By (1)

Similar Records

Related Subjects