GPU acceleration of the Locally Selfconsistent Multiple Scattering code for first principles calculation of the ground state and statistical physics of materials
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- NVIDIA Corporation, Santa Clara, CA (United States)
The Locally Self-consistent Multiple Scattering (LSMS) code solves the first principles Density Functional theory Kohn–Sham equation for a wide range of materials with a special focus on metals, alloys and metallic nano-structures. It has traditionally exhibited near perfect scalability on massively parallel high performance computer architectures. In this paper, we present our efforts to exploit GPUs to accelerate the LSMS code to enable first principles calculations of O(100,000) atoms and statistical physics sampling of finite temperature properties. We reimplement the scattering matrix calculation for GPUs with a block matrix inversion algorithm that only uses accelerator memory. Finally, using the Cray XK7 system Titan at the Oak Ridge Leadership Computing Facility we achieve a sustained performance of 14.5PFlop/s and a speedup of 8.6 compared to the CPU only code.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
- Contributing Organization:
- NVIDIA Corporation, Santa Clara, CA (United States)
- Grant/Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1335344
- Journal Information:
- Computer Physics Communications, Journal Name: Computer Physics Communications Vol. 211; ISSN 0010-4655
- Publisher:
- ElsevierCopyright Statement
- Country of Publication:
- United States
- Language:
- English
| Locally self-consistent embedding approach for disordered electronic systems 
 | journal | August 2019 | 
Similar Records
Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations
TTDFT: A GPU accelerated Tucker tensor DFT code for large-scale Kohn-Sham DFT calculations
