A Block-Asynchronous Relaxation Method for Graphics Processing Units

Antz, Hartwig; Tomov, Stanimire; Dongarra, Jack; Heuveline, Vincent

doi:10.2172/1173288

Title: A Block-Asynchronous Relaxation Method for Graphics Processing Units

Technical Report · Wed Nov 30 00:00:00 EST 2011

DOI:https://doi.org/10.2172/1173288· OSTI ID:1173288

Antz, Hartwig ^[1]; Tomov, Stanimire ^[2]; Dongarra, Jack ^[3]; Heuveline, Vincent ^[1]

Karlsruhe Inst. of Technology (KIT) (Germany)
Univ. of Tennessee, Knoxville, TN (United States)
Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Univ. of Manchester (United Kingdom)

In this paper, we analyze the potential of asynchronous relaxation methods on Graphics Processing Units (GPUs). For this purpose, we developed a set of asynchronous iteration algorithms in CUDA and compared them with a parallel implementation of synchronous relaxation methods on CPU-based systems. For a set of test matrices taken from the University of Florida Matrix Collection we monitor the convergence behavior, the average iteration time and the total time-to-solution time. Analyzing the results, we observe that even for our most basic asynchronous relaxation scheme, despite its lower convergence rate compared to the Gauss-Seidel relaxation (that we expected), the asynchronous iteration running on GPUs is still able to provide solution approximations of certain accuracy in considerably shorter time then Gauss- Seidel running on CPUs. Hence, it overcompensates for the slower convergence by exploiting the scalability and the good fit of the asynchronous schemes for the highly parallel GPU architectures. Further, enhancing the most basic asynchronous approach with hybrid schemes – using multiple iterations within the ”subdomain” handled by a GPU thread block and Jacobi-like asynchronous updates across the ”boundaries”, subject to tuning various parameters – we manage to not only recover the loss of global convergence but often accelerate convergence of up to two times (compared to the effective but difficult to parallelize Gauss-Seidel type of schemes), while keeping the execution time of a global iteration practically the same. This shows the high potential of the asynchronous methods not only as a stand alone numerical solver for linear systems of equations fulfilling certain convergence conditions but more importantly as a smoother in multigrid methods. Due to the explosion of parallelism in todays architecture designs, the significance and the need for asynchronous methods, as the ones described in this work, is expected to grow.

View Technical Report

Cite

Export

Save

Research Organization:: Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

Sponsoring Organization:: USDOE Office of Science (SC)

DOE Contract Number:: AC02-05CH11231

OSTI ID:: 1173288

Report Number(s):: LBNL-5784E

Country of Publication:: United States

Language:: English

Similar Records

Two-Stage Gauss-Seidel Preconditioners and Smoothers for Krylov Solvers on a GPU Cluster: Preprint

Conference · Wed Feb 09 00:00:00 EST 2022 · OSTI ID:1173288

Thomas, Stephen; Yamazaki, Ichitaro; Berger-Vergiat, Luc; +5 more

Improved parallel solution techniques for the integral transport matrix method

Conference · Tue Nov 23 00:00:00 EST 2010 · OSTI ID:1173288

Zerr, Robert J; Azmy, Yousry Y

Block-Iterative Methods for 3D Constant-Coefficient Stencils on GPUs and Multicore CPUs

Technical Report · Sun Jun 01 00:00:00 EDT 2014 · OSTI ID:1173288

Philip, Bobby; Wang, Zhen; Berrill, Mark A

Related Subjects

97 MATHEMATICS AND COMPUTING
Asynchronous Relaxation
Chaotic Iteration
Graphics Processing Units (GPUs)
Jacobi Method

Title: A Block-Asynchronous Relaxation Method for Graphics Processing Units

Citation Formats

Similar Records

Related Subjects