Kernel fusion in atomistic spin dynamics simulations on Nvidia GPUs using tensor core
- Northeastern Univ., Boston, MA (United States); Stanford Univ., CA (United States). Stanford Institute for Materials and Energy Sciences (SIMES); SLAC National Accelerator Laboratory (SLAC), Menlo Park, CA (United States). Linac Coherent Light Source (LCLS); SLAC
- Rutgers Univ., Piscataway, NJ (United States)
- Stanford Univ., CA (United States). Stanford Institute for Materials and Energy Sciences (SIMES); SLAC National Accelerator Laboratory (SLAC), Menlo Park, CA (United States). Linac Coherent Light Source (LCLS)
- Northeastern Univ., Boston, MA (United States)
In atomistic spin dynamics simulations, the time cost of constructing the space- and time-displaced pair correlation function in real space increases quadratically as the number of spins N, leading to significant computational effort. The GEMM subroutine can be adopted to accelerate the calculation of the dynamical spin-spin correlation function, but the computational cost of simulating large spin systems (>40000 spins) on CPUs remains expensive. In this work, we perform the simulation on the graphics processing unit (GPU), a hardware solution widely used as an accelerator for scientific computing and deep learning. Here we show that GPUs can accelerate the simulation up to 25-fold compared to multi-core CPUs when using the GEMM subroutine on both. To hide memory latency, we fuse the element-wise operation into the GEMM kernel using CUTLASS that can improve the performance by 26% ~ 33% compared to implementation based on cuBLAS. Furthermore, we perform the on-the-fly calculation in the epilogue of the GEMM subroutine to avoid saving intermediate results on global memory, which makes the large-scale atomistic spin dynamics simulation feasible and affordable.
- Research Organization:
- SLAC National Accelerator Laboratory (SLAC), Menlo Park, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Basic Energy Sciences (BES)
- Grant/Contract Number:
- AC02-76SF00515; SC0022216; AC02-05CH11231
- OSTI ID:
- 2446864
- Journal Information:
- Journal of Computational Science, Journal Name: Journal of Computational Science Vol. 81; ISSN 1877-7503
- Publisher:
- ElsevierCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
A high-performance implementation of atomistic spin dynamics simulations on x86 CPUs
GPU-acceleration of the ELPA2 distributed eigensolver for dense symmetric and hermitian eigenproblems