Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Kernel fusion in atomistic spin dynamics simulations on Nvidia GPUs using tensor core

Journal Article · · Journal of Computational Science
 [1];  [2];  [3];  [4]
  1. Northeastern Univ., Boston, MA (United States); Stanford Univ., CA (United States). Stanford Institute for Materials and Energy Sciences (SIMES); SLAC National Accelerator Laboratory (SLAC), Menlo Park, CA (United States). Linac Coherent Light Source (LCLS); SLAC
  2. Rutgers Univ., Piscataway, NJ (United States)
  3. Stanford Univ., CA (United States). Stanford Institute for Materials and Energy Sciences (SIMES); SLAC National Accelerator Laboratory (SLAC), Menlo Park, CA (United States). Linac Coherent Light Source (LCLS)
  4. Northeastern Univ., Boston, MA (United States)

In atomistic spin dynamics simulations, the time cost of constructing the space- and time-displaced pair correlation function in real space increases quadratically as the number of spins N, leading to significant computational effort. The GEMM subroutine can be adopted to accelerate the calculation of the dynamical spin-spin correlation function, but the computational cost of simulating large spin systems (>40000 spins) on CPUs remains expensive. In this work, we perform the simulation on the graphics processing unit (GPU), a hardware solution widely used as an accelerator for scientific computing and deep learning. Here we show that GPUs can accelerate the simulation up to 25-fold compared to multi-core CPUs when using the GEMM subroutine on both. To hide memory latency, we fuse the element-wise operation into the GEMM kernel using CUTLASS that can improve the performance by 26% ~ 33% compared to implementation based on cuBLAS. Furthermore, we perform the on-the-fly calculation in the epilogue of the GEMM subroutine to avoid saving intermediate results on global memory, which makes the large-scale atomistic spin dynamics simulation feasible and affordable.

Research Organization:
SLAC National Accelerator Laboratory (SLAC), Menlo Park, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Basic Energy Sciences (BES)
Grant/Contract Number:
AC02-76SF00515; SC0022216; AC02-05CH11231
OSTI ID:
2446864
Journal Information:
Journal of Computational Science, Journal Name: Journal of Computational Science Vol. 81; ISSN 1877-7503
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English

References (34)

X-ray Photon Correlation Spectroscopy Studies of Surfaces and Thin Films journal September 2014
Parallelization and implementation of multi-spin Monte Carlo simulation of 2D square Ising model using MPI and C++ journal August 2018
Improved CUDA programs for GPU computing of Swendsen–Wang multi-cluster spin flip algorithm: 2D and 3D Ising, Potts, and XY models journal March 2016
SPILADY: A parallel CPU and GPU code for spin–lattice magnetic molecular dynamics simulations journal October 2016
A high-performance implementation of atomistic spin dynamics simulations on x86 CPUs journal October 2023
A GPU-based large-scale Monte Carlo simulation method for systems with long-range interactions journal June 2017
Massively parallel symplectic algorithm for coupled magnetic spin dynamics and molecular dynamics journal November 2018
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns journal December 2014
GPU accelerated Monte Carlo simulations of lattice spin models journal January 2011
Signatures of a liquid-crystal transition in spin-wave excitations of skyrmions journal December 2020
Monte Carlo Simulation in Statistical Physics journal January 1993
The Landau–Lifshitz equation in atomistic models journal September 2015
Skyrmion fluctuations at a first-order phase transition boundary journal May 2020
The Mpemba effect in spin glasses is a persistent memory effect journal July 2019
Spin dynamics of the antiferromagnetic Heisenberg model on a kagome bilayer journal June 2021
Spin-dynamics study of the dynamic critical behavior of the three-dimensional classical Heisenberg ferromagnet journal February 1994
Monte Carlo Calculation of the Scaling Equation of State for the Classical Heisenberg Ferromagnet journal April 1973
Spatial correlation functions in three-dimensional Ising spin glasses journal July 2005
Comprehensive study of the dynamics of a classical Kitaev spin liquid journal October 2017
Classical and quantum spin dynamics of the honeycomb Γ model journal July 2018
General method for atomistic spin-lattice dynamics with first-principles accuracy journal March 2019
Spirit : Multifunctional framework for atomistic spin simulations journal June 2019
Parallelization of the Wolff single-cluster algorithm journal February 2010
Nanosecond X-Ray Photon Correlation Spectroscopy on Magnetic Skyrmions journal August 2017
Dynamical Structure Factor of the Three-Dimensional Quantum Spin Liquid Candidate NaCaNi 2 F 7 journal April 2019
Monte Carlo Simulation with Time Step Quantification in Terms of Langevin Dynamics journal January 2000
Spontaneous fluctuations in a magnetic Fe/Gd skyrmion lattice journal September 2021
Resonant inelastic x-ray scattering studies of elementary excitations journal June 2011
Kokkos 3: Programming Model Extensions for the Exascale Era journal January 2021
Dynamics of Antiferromagnetic Heisenberg Model at Low Temperatures journal February 1990
Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning
  • Zhang, Xiuxia; Tan, Guangming; Xue, Shuangbai
  • Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP '17 https://doi.org/10.1145/3018743.3018755
conference January 2017
A snapshot review—Fluctuations in quantum materials: from skyrmions to superconductivity journal April 2021
Improved CUDA programs for GPU computing of Swendsen–Wang multi-cluster spin flip algorithm: 2D and 3D Ising, Potts, and XY models dataset January 2019
An introduction to the theory of nuclear neutron scattering in condensed matter journal January 2014

Similar Records

Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA Tesla GPU Cluster
Conference · Mon Aug 31 00:00:00 EDT 2009 · OSTI ID:965387

A high-performance implementation of atomistic spin dynamics simulations on x86 CPUs
Journal Article · Tue Jul 11 00:00:00 EDT 2023 · Computer Physics Communications · OSTI ID:2308837

GPU-acceleration of the ELPA2 distributed eigensolver for dense symmetric and hermitian eigenproblems
Journal Article · Wed Dec 30 23:00:00 EST 2020 · Computer Physics Communications · OSTI ID:1773653