Title: Optimizing Excited-State Electronic-Structure Codes for Intel Knights Landing: A Case Study on the BerkeleyGW Software

We profile and optimize calculations performed with the BerkeleyGW code on the Xeon-Phi architecture. BerkeleyGW depends both on hand-tuned critical kernels as well as on BLAS and FFT libraries. We describe the optimization process and performance improvements achieved. We discuss a layered parallelization strategy to take advantage of vector, thread and node-level parallelism. We discuss locality changes (including the consequence of the lack of L3 cache) and effective use of the on-package high-bandwidth memory. We show preliminary results on Knights-Landing including a roofline study of code performance before and after a number of optimizations. We find that the GW method is particularly well-suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-wave components, band-pairs, and frequencies.
Conference: Presented at the ISC High Performance 2016 International Workshops - Application Performance on Intel Xeon Phi - Being Prepared for KNL & Beyond (IXPUG), 19-23 June 2016, Frankfurt, Germany
Cham, Switzerland: Springer International Publishing
NREL (National Renewable Energy Laboratory (NREL), Golden, CO (United States))
USDOE Office of Energy Efficiency and Renewable Energy (EERE), NREL Laboratory Directed Research and Development (LDRD)
United States
97 MATHEMATICS AND COMPUTING BerkeleyGW; optimization; performance