A two-level GPU-accelerated incomplete LU preconditioner for general sparse linear systems

Xu, Tianshi; Li, Rui Peng; Osei-Kuffuor, Daniel

doi:10.1177/10943420251319334

A two-level GPU-accelerated incomplete LU preconditioner for general sparse linear systems

Journal Article · Sat Feb 22 23:00:00 EST 2025 · International Journal of High Performance Computing Applications

DOI:https://doi.org/10.1177/10943420251319334· OSTI ID:2537970

^[1]; ^[2]; ^[2]

Emory University, Atlanta, GA (United States)
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)

Here, this paper presents a parallel preconditioning approach based on incomplete LU (ILU) factorizations in the framework of Domain Decomposition (DD) for general sparse linear systems. We focus on distributed memory parallel architectures, specifically, those that are equipped with graphic processing units (GPUs). In addition to block-Jacobi, we present general purpose two-level ILU Schur complement-based approaches, where different strategies are presented to solve the coarse-level reduced system. These strategies are combined with modified ILU methods in the construction of the coarse-level operator, in order to effectively remove smooth errors by targeting an algebraically smooth vector. We leverage available GPU-based sparse matrix kernels to accelerate the setup and the solve phases of the proposed ILU preconditioner. We evaluate the efficiency of the proposed methods as a smoother for algebraic multigrid (AMG) and as a preconditioner for Krylov subspace methods on challenging anisotropic diffusion problems and a collection of general sparse matrices.

View Accepted Manuscript (DOE)

Research Organization:: Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)

Sponsoring Organization:: USDOE National Nuclear Security Administration (NNSA)

Grant/Contract Number:: AC52-07NA27344

OSTI ID:: 2537970

Alternate ID(s):: OSTI ID: 2522848

Report Number(s):: LLNL--JRNL-813686; 1021773

Journal Information:: International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications Journal Issue: 3 Vol. 39; ISSN 1094-3420

Publisher:: SAGECopyright Statement

Country of Publication:: United States

Language:: English

References (65)

Reducing communication in algebraic multigrid using additive variants: REDUCING COMMUNICATION IN AMG WITH ADDITIVE VARIANTS Vassilevski, Panayot S.; Yang, Ulrike Meier Numerical Linear Algebra with Applications, Vol. 21, Issue 2 https://doi.org/10.1002/nla.1928	journal	February 2014
pARMS: a parallel version of the algebraic recursive multilevel solver Li, Zhongze; Saad, Yousef; Sosonkina, Masha Numerical Linear Algebra with Applications, Vol. 10, Issue 5-6 https://doi.org/10.1002/nla.325	journal	January 2003
Convergence of a balancing domain decomposition by constraints and energy minimization Mandel, Jan; Dohrmann, Clark R. Numerical Linear Algebra with Applications, Vol. 10, Issue 7 https://doi.org/10.1002/nla.341	journal	January 2003
FETI-DP: a dual-primal unified FETI method?part I: A faster alternative to the two-level FETI method Farhat, Charbel; Lesoinne, Michel; LeTallec, Patrick International Journal for Numerical Methods in Engineering, Vol. 50, Issue 7 https://doi.org/10.1002/nme.76	journal	January 2001
hypre: A Library of High Performance Preconditioners Falgout, Robert D.; Yang, Ulrike Meier; Goos, Gerhard Computational Science — ICCS 2002: International Conference Amsterdam, The Netherlands, April 21–24, 2002 Proceedings, Part III https://doi.org/10.1007/3-540-47789-6_66	book	April 2002
A Distributed CPU-GPU Sparse Direct Solver Sao, Piyush; Vuduc, Richard; Li, Xiaoye Sherry Lecture Notes in Computer Science https://doi.org/10.1007/978-3-319-09873-9_41	book	January 2014
Solving Sparse Linear Systems on NVIDIA Tesla GPUs Wang, Mingliang; Klie, Hector; Parashar, Manish Computational Science – ICCS 2009 -- 9th International Conference Baton Rouge, LA, USA, May 25-27, 2009 Proceedings, Part I https://doi.org/10.1007/978-3-642-01970-8_87	conference	May 2009
Partitioning sparse rectangular matrices for parallel processing Kolda, Tamara G. Solving Irregularly Structured Problems in Parallel https://doi.org/10.1007/BFb0018528	book	January 1998
Abstract robust coarse spaces for systems of PDEs via generalized eigenproblems in the overlaps Spillane, N.; Dolean, V.; Hauret, P. Numerische Mathematik, Vol. 126, Issue 4 https://doi.org/10.1007/s00211-013-0576-y	journal	August 2013
Multilevel ILU decomposition Bank, Randolph E.; Wagner, Christian Numerische Mathematik, Vol. 82, Issue 4 https://doi.org/10.1007/s002110050430	journal	June 1999
GPU-accelerated preconditioned iterative linear solvers Li, Ruipeng; Saad, Yousef The Journal of Supercomputing, Vol. 63, Issue 2 https://doi.org/10.1007/s11227-012-0825-3	journal	October 2012
Experimental study of ILU preconditioners for indefinite matrices Chow, Edmond; Saad, Yousef Journal of Computational and Applied Mathematics, Vol. 86, Issue 2 https://doi.org/10.1016/S0377-0427(97)00171-4	journal	December 1997
Comparison of multigrid and incomplete LU shifted-Laplace preconditioners for the inhomogeneous Helmholtz equation Erlangga, Y. A.; Vuik, C.; Oosterlee, C. W. Applied Numerical Mathematics, Vol. 56, Issue 5 https://doi.org/10.1016/j.apnum.2005.04.039	journal	May 2006
Preconditioning Helmholtz linear systems Osei-Kuffuor, Daniel; Saad, Yousef Applied Numerical Mathematics, Vol. 60, Issue 4 https://doi.org/10.1016/j.apnum.2009.09.003	journal	April 2010
A two-level ILU preconditioner for electromagnetic applications Cerdán, J.; Marín, J.; Mas, J. Journal of Computational and Applied Mathematics, Vol. 309 https://doi.org/10.1016/j.cam.2016.03.012	journal	January 2017
A GPU accelerated aggregation algebraic multigrid method Gandham, Rajesh; Esler, Kenneth; Zhang, Yongpeng Computers & Mathematics with Applications, Vol. 68, Issue 10 https://doi.org/10.1016/j.camwa.2014.08.022	journal	November 2014
MFEM: A modular finite element methods library Anderson, Robert; Andrej, Julian; Barker, Andrew Computers & Mathematics with Applications, Vol. 81 https://doi.org/10.1016/j.camwa.2020.06.009	journal	January 2021
Multigrid reduction preconditioning framework for coupled processes in porous and fractured media Bui, Quan M.; Hamon, François P.; Castelletto, Nicola Computer Methods in Applied Mechanics and Engineering, Vol. 387 https://doi.org/10.1016/j.cma.2021.114111	journal	December 2021
Solving lattice QCD systems of equations using mixed precision solvers on GPUs Clark, M. A.; Babich, R.; Barros, K. Computer Physics Communications, Vol. 181, Issue 9 https://doi.org/10.1016/j.cpc.2010.05.002	journal	September 2010
Cucheb: A GPU implementation of the filtered Lanczos procedure Aurentz, Jared L.; Kalantzis, Vassilis; Saad, Yousef Computer Physics Communications, Vol. 220 https://doi.org/10.1016/j.cpc.2017.06.016	journal	November 2017
Accelerating sparse Cholesky factorization on GPUs Rennich, Steven C.; Stosic, Darko; Davis, Timothy A. Parallel Computing, Vol. 59 https://doi.org/10.1016/j.parco.2016.06.004	journal	November 2016
Concurrent number cruncher: a GPU implementation of a general sparse linear solver Buatois, Luc; Caumon, Guillaume; Lévy, Bruno International Journal of Parallel, Emergent and Distributed Systems, Vol. 24, Issue 3 https://doi.org/10.1080/17445760802337010	journal	June 2009
An incomplete factorization technique for positive definite linear systems Manteuffel, T. A. Mathematics of Computation, Vol. 34, Issue 150 https://doi.org/10.1090/S0025-5718-1980-0559197-0	journal	May 1980
Parallel multilevel preconditioners Bramble, James H.; Pasciak, Joseph E.; Xu, Jinchao Mathematics of Computation, Vol. 55, Issue 191 https://doi.org/10.1090/S0025-5718-1990-1023042-6	journal	September 1990
Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication Catalyurek, U. V.; Aykanat, C. IEEE Transactions on Parallel and Distributed Systems, Vol. 10, Issue 7 https://doi.org/10.1109/71.780863	journal	July 1999
A Parallel Direct/Iterative Solver Based on a Schur Complement Approach Gaidamour, J.; Hénon, P. 2008 IEEE 11th International Conference on Computational Science and Engineering (CSE), 2008 11th IEEE International Conference on Computational Science and Engineering https://doi.org/10.1109/CSE.2008.36	conference	July 2008
ShyLU: A Hybrid-Hybrid Solver for Multicore Platforms Rajamanickam, Sivasankaran; Boman, Erik G.; Heroux, Michael A. 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.64	conference	May 2012
An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs Yamazaki, Ichitaro; Heinlein, Alexander; Rajamanickam, Sivasankaran 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS54959.2023.00073	conference	May 2023
A Hybrid Implementation of Two-Level Domain Decomposition Algorithm for Solving Elliptic Equation on CPU/GPUs Luo, Li; Zhao, Yubo; Cai, Xiao-Chuan 2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies https://doi.org/10.1109/PDCAT.2012.18	conference	December 2012
GPU Acceleration of Algebraic Multigrid Preconditioners for Discrete Elliptic Field Problems Richter, Christian; Schops, Sebastian; Clemens, Markus IEEE Transactions on Magnetics, Vol. 50, Issue 2 https://doi.org/10.1109/TMAG.2013.2283099	journal	February 2014
A Novel Multigrid Based Preconditioner For Heterogeneous Helmholtz Problems Erlangga, Y. A.; Oosterlee, C. W.; Vuik, C. SIAM Journal on Scientific Computing, Vol. 27, Issue 4 https://doi.org/10.1137/040615195	journal	January 2006
Spectral Analysis of the Discrete Helmholtz Operator Preconditioned with a Shifted Laplacian van Gijzen, M. B.; Erlangga, Y. A.; Vuik, C. SIAM Journal on Scientific Computing, Vol. 29, Issue 5 https://doi.org/10.1137/060661491	journal	January 2007
Partitioning Sparse Matrices with Eigenvectors of Graphs Pothen, Alex; Simon, Horst D.; Liou, Kang-Pu SIAM Journal on Matrix Analysis and Applications, Vol. 11, Issue 3 https://doi.org/10.1137/0611030	journal	July 1990
Nested Dissection of a Regular Finite Element Mesh George, Alan SIAM Journal on Numerical Analysis, Vol. 10, Issue 2 https://doi.org/10.1137/0710032	journal	April 1973
An Algorithm for Reducing the Bandwidth and Profile of a Sparse Matrix Gibbs, Norman E.; Poole, Jr., William G.; Stockmeyer, Paul K. SIAM Journal on Numerical Analysis, Vol. 13, Issue 2 https://doi.org/10.1137/0713023	journal	April 1976
Compatible Relaxation and Coarsening in Algebraic Multigrid Brannick, James J.; Falgout, Robert D. SIAM Journal on Scientific Computing, Vol. 32, Issue 3 https://doi.org/10.1137/090772216	journal	January 2010
Iterative Methods for Sparse Linear Systems Saad, Yousef https://doi.org/10.1137/1.9780898718003	book	January 2003
Multigrid Smoothers for Ultraparallel Computing Baker, Allison H.; Falgout, Robert D.; Kolev, Tzanio V. SIAM Journal on Scientific Computing, Vol. 33, Issue 5 https://doi.org/10.1137/100798806	journal	January 2011
Fine-Grained Parallel Incomplete LU Factorization Chow, Edmond; Patel, Aftab SIAM Journal on Scientific Computing, Vol. 37, Issue 2 https://doi.org/10.1137/140968896	journal	January 2015
AmgX: A Library for GPU Accelerated Algebraic Multigrid and Preconditioned Iterative Methods Naumov, M.; Arsaev, M.; Castonguay, P. SIAM Journal on Scientific Computing, Vol. 37, Issue 5 https://doi.org/10.1137/140980260	journal	January 2015
ViennaCL---Linear Algebra Library for Multi- and Many-Core Architectures Rupp, Karl; Tillet, Philippe; Rudolf, Florian SIAM Journal on Scientific Computing, Vol. 38, Issue 5 https://doi.org/10.1137/15M1026419	journal	January 2016
A Rational Function Preconditioner For Indefinite Sparse Linear Systems Xi, Yuanzhe; Saad, Yousef SIAM Journal on Scientific Computing, Vol. 39, Issue 3 https://doi.org/10.1137/16M1078409	journal	January 2017
A Hierarchical Low Rank Schur Complement Preconditioner for Indefinite Linear Systems Dillon, Geoffrey; Kalantzis, Vassilis; Xi, Yuanzhe SIAM Journal on Scientific Computing, Vol. 40, Issue 4 https://doi.org/10.1137/17M1143320	journal	January 2018
The Eigenvalues Slicing Library (EVSL): Algorithms, Implementation, and Software Li, Ruipeng; Xi, Yuanzhe; Erlandson, Lucas SIAM Journal on Scientific Computing, Vol. 41, Issue 4 https://doi.org/10.1137/18M1170935	journal	January 2019
Solving the Three-Dimensional High-frequency Helmholtz Equation Using Contour Integration and Polynomial Preconditioning Liu, Xiao; Xi, Yuanzhe; Saad, Yousef SIAM Journal on Matrix Analysis and Applications, Vol. 41, Issue 1 https://doi.org/10.1137/18M1228128	journal	January 2020
Combining Machine Learning and Adaptive Coarse Spaces---A Hybrid Approach for Robust FETI-DP Methods in Three Dimensions Heinlein, Alexander; Klawonn, Axel; Lanser, Martin SIAM Journal on Scientific Computing, Vol. 43, Issue 5 https://doi.org/10.1137/20M1344913	journal	January 2021
On Generalizing the Algebraic Multigrid Framework Falgout, Robert D.; Vassilevski, Panayot S. SIAM Journal on Numerical Analysis, Vol. 42, Issue 4 https://doi.org/10.1137/S0036142903429742	journal	January 2004
Matrix Renumbering ILU: An Effective Algebraic Multilevel ILU Preconditioner for Sparse Matrices Botta, E. F. F.; Wubs, F. W. SIAM Journal on Matrix Analysis and Applications, Vol. 20, Issue 4 https://doi.org/10.1137/S0895479897319301	journal	January 1999
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs Karypis, George; Kumar, Vipin SIAM Journal on Scientific Computing, Vol. 20, Issue 1 https://doi.org/10.1137/S1064827595287997	journal	January 1998
A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems Cai, Xiao-Chuan; Sarkis, Marcus SIAM Journal on Scientific Computing, Vol. 21, Issue 2 https://doi.org/10.1137/S106482759732678X	journal	January 1999
Distributed Schur Complement Techniques for General Sparse Linear Systems Saad, Yousef; Sosonkina, Maria SIAM Journal on Scientific Computing, Vol. 21, Issue 4 https://doi.org/10.1137/S1064827597328996	journal	January 1999
Algebraic Two-Level Preconditioners for the Schur Complement Method Carvalho, L. M.; Giraud, L.; Le Tallec, P. SIAM Journal on Scientific Computing, Vol. 22, Issue 6 https://doi.org/10.1137/S1064827598340809	journal	January 2001
mpibind León, Edgar A. Proceedings of the International Symposium on Memory Systems https://doi.org/10.1145/3132402.3132415	conference	October 2017
Efficient parallel computation of ILU(k) preconditioners Hysom, David; Pothen, Alex Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99 https://doi.org/10.1145/331532.331561	conference	January 1999
Parallel threshold-based ILU factorization Karypis, George; Kumar, Vipin Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '97 https://doi.org/10.1145/509593.509621	conference	January 1997
Modular Finite Element Methods (MFEM) Kolev, Tzanio; Dobrev, Veselin https://doi.org/10.11578/dc.20171025.1248	software	June 2010
GLVis: OpenGL Finite Element Visualization Tool Kolev, Tzanio; Dobrev, Veselin https://doi.org/10.11578/dc.20171025.1249	software	June 2010
HiFlow3 – Technical Report on Release 2.0 Gawlok, Simon; Gerstner, Philipp; Haupt, Saskia Heidelberg University, Interdisciplinary Center for Scientific Computing (IWR) https://doi.org/10.11588/emclpp.2017.06.42879	text	January 2017
Acceleration of a parallel BDDC solver by using graphics processing units on subdomains Šístek, Jakub; Oberhuber, Tomáš The International Journal of High Performance Computing Applications, Vol. 37, Issue 2 https://doi.org/10.1177/10943420221136873	journal	November 2022
Incomplete Gaussian Elimination as a Preconditioning for Generalized Conjugate Gradient Acceleration Wallis, J. R. SPE Reservoir Simulation Symposium, All Days https://doi.org/10.2118/12265-ms	conference	November 1983
Constrained Residual Acceleration of Conjugate Residual Methods Wallis, J. R.; Kendall, R. P.; Little, T. E. SPE Reservoir Simulation Symposium https://doi.org/10.2118/13536-MS	conference	April 2013
Multigrid Reduction for Coupled Flow Problems with Application to Reservoir Simulation Wang, Lu; Osei-Kuffuor, Daniel; Falgout, Rob SPE Reservoir Simulation Conference https://doi.org/10.2118/182723-MS	conference	February 2017
PDSLin User Guide Yamazaki, Ichitaro; Ng, Esmond; Li, Xiaoye https://doi.org/10.2172/1050673	report	June 2011
GPU-Accelerated LOBPCG Method with Inexact Null-Space Filtering for Solving Generalized Eigenvalue Problems in Computational Electromagnetics Analysis with Higher-Order FEM Dziekonski, A.; Rewienski, M.; Sypek, P. Communications in Computational Physics, Vol. 22, Issue 4 https://doi.org/10.4208/cicp.OA-2016-0168	journal	July 2017
Parallel Implementation of a Two-level Algebraic ILU(k)-based Domain Decomposition Preconditioner Nievinski, Italo Cristiano L.; Souza, Michael; Goldfeld, Paulo Tema (São Carlos), Vol. 19, Issue 1 https://doi.org/10.5540/tema.2018.019.01.59	journal	May 2018

Similar Records

Two-Stage Gauss-Seidel Preconditioners and Smoothers for Krylov Solvers on a GPU Cluster: Preprint

Conference · Tue Feb 08 23:00:00 EST 2022 · OSTI ID:1845268

On performance of Krylov smoothing for fully-coupled AMG preconditioners for VMS resistive MHD

Program Document · Wed Nov 01 00:00:00 EDT 2017 · OSTI ID:1429689

Final Report - Summer Visit 2010

Technical Report · Mon Sep 12 00:00:00 EDT 2011 · OSTI ID:1026473

Related Subjects

97 MATHEMATICS AND COMPUTING
AMG
GPU computing
ILU factorization
Multilevel methods
Preconditioning

A two-level GPU-accelerated incomplete LU preconditioner for general sparse linear systems

Citation Formats

References (65)

Similar Records

Related Subjects