GPU acceleration of the Locally Selfconsistent Multiple Scattering code for first principles calculation of the ground state and statistical physics of materials
The Locally Selfconsistent Multiple Scattering (LSMS) code solves the first principles Density Functional theory Kohnâ€“Sham equation for a wide range of materials with a special focus on metals, alloys and metallic nanostructures. It has traditionally exhibited near perfect scalability on massively parallel high performance computer architectures. In this paper, we present our efforts to exploit GPUs to accelerate the LSMS code to enable first principles calculations of O(100,000) atoms and statistical physics sampling of finite temperature properties. We reimplement the scattering matrix calculation for GPUs with a block matrix inversion algorithm that only uses accelerator memory. Finally, using the Cray XK7 system Titan at the Oak Ridge Leadership Computing Facility we achieve a sustained performance of 14.5PFlop/s and a speedup of 8.6 compared to the CPU only code.
 Authors:

^{[1]};
^{[2]};
^{[2]};
^{[2]};
^{[1]}
 Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
 NVIDIA Corporation, Santa Clara, CA (United States)
 Publication Date:
 Grant/Contract Number:
 AC0500OR22725
 Type:
 Accepted Manuscript
 Journal Name:
 Computer Physics Communications
 Additional Journal Information:
 Journal Volume: 211; Journal ID: ISSN 00104655
 Publisher:
 Elsevier
 Research Org:
 Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
 Sponsoring Org:
 USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC22); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC21)
 Contributing Orgs:
 NVIDIA Corporation, Santa Clara, CA (United States)
 Country of Publication:
 United States
 Language:
 English
 Subject:
 36 MATERIALS SCIENCE; Firstprinciples; MonteCarlo; Phase transitions
 OSTI Identifier:
 1335344
 Alternate Identifier(s):
 OSTI ID: 1396465
Eisenbach, Markus, Larkin, Jeff, Lutjens, Justin, Rennich, Steven, and Rogers, James H. GPU acceleration of the Locally Selfconsistent Multiple Scattering code for first principles calculation of the ground state and statistical physics of materials. United States: N. p.,
Web. doi:10.1016/j.cpc.2016.07.013.
Eisenbach, Markus, Larkin, Jeff, Lutjens, Justin, Rennich, Steven, & Rogers, James H. GPU acceleration of the Locally Selfconsistent Multiple Scattering code for first principles calculation of the ground state and statistical physics of materials. United States. doi:10.1016/j.cpc.2016.07.013.
Eisenbach, Markus, Larkin, Jeff, Lutjens, Justin, Rennich, Steven, and Rogers, James H. 2016.
"GPU acceleration of the Locally Selfconsistent Multiple Scattering code for first principles calculation of the ground state and statistical physics of materials". United States.
doi:10.1016/j.cpc.2016.07.013. https://www.osti.gov/servlets/purl/1335344.
@article{osti_1335344,
title = {GPU acceleration of the Locally Selfconsistent Multiple Scattering code for first principles calculation of the ground state and statistical physics of materials},
author = {Eisenbach, Markus and Larkin, Jeff and Lutjens, Justin and Rennich, Steven and Rogers, James H.},
abstractNote = {The Locally Selfconsistent Multiple Scattering (LSMS) code solves the first principles Density Functional theory Kohnâ€“Sham equation for a wide range of materials with a special focus on metals, alloys and metallic nanostructures. It has traditionally exhibited near perfect scalability on massively parallel high performance computer architectures. In this paper, we present our efforts to exploit GPUs to accelerate the LSMS code to enable first principles calculations of O(100,000) atoms and statistical physics sampling of finite temperature properties. We reimplement the scattering matrix calculation for GPUs with a block matrix inversion algorithm that only uses accelerator memory. Finally, using the Cray XK7 system Titan at the Oak Ridge Leadership Computing Facility we achieve a sustained performance of 14.5PFlop/s and a speedup of 8.6 compared to the CPU only code.},
doi = {10.1016/j.cpc.2016.07.013},
journal = {Computer Physics Communications},
number = ,
volume = 211,
place = {United States},
year = {2016},
month = {7}
}