Evaluating asynchronous Schwarz solvers on GPUs
- Karlsruhe Institute of Technology, Karlsruhe, Germany
- Karlsruhe Institute of Technology, Karlsruhe, Germany, University of Tennessee, Knoxville, USA
With the commencement of the exascale computing era, we realize that the majority of the leadership supercomputers are heterogeneous and massively parallel. Even a single node can contain multiple co-processors such as GPUs and multiple CPU cores. For example, ORNL’s Summit accumulates six NVIDIA Tesla V100 GPUs and 42 IBM Power9 cores on each node. Synchronizing across compute resources of multiple nodes can be prohibitively expensive. Hence, it is necessary to develop and study asynchronous algorithms that circumvent this issue of bulk-synchronous computing. In this study, we examine the asynchronous version of the abstract Restricted Additive Schwarz method as a solver. We do not explicitly synchronize, but allow the communication between the sub-domains to be completely asynchronous, thereby removing the bulk synchronous nature of the algorithm.
We accomplish this by using the one-sided Remote Memory Access (RMA) functions of the MPI standard. We study the benefits of using such an asynchronous solver over its synchronous counterpart. We also study the communication patterns governed by the partitioning and the overlap between the sub-domains on the global solver. Finally, we show that this concept can render attractive performance benefits over the synchronous counterparts even for a well-balanced problem.
- Sponsoring Organization:
- USDOE
- Grant/Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1778413
- Journal Information:
- International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications Journal Issue: 3 Vol. 35; ISSN 1094-3420
- Publisher:
- SAGE PublicationsCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Asynchronous Iterative Solvers for Extreme-Scale Computing