skip to main content

Title: Optimizing Excited-State Electronic-Structure Codes for Intel Knights Landing: A Case Study on the BerkeleyGW Software

We profile and optimize calculations performed with the BerkeleyGW code on the Xeon-Phi architecture. BerkeleyGW depends both on hand-tuned critical kernels as well as on BLAS and FFT libraries. We describe the optimization process and performance improvements achieved. We discuss a layered parallelization strategy to take advantage of vector, thread and node-level parallelism. We discuss locality changes (including the consequence of the lack of L3 cache) and effective use of the on-package high-bandwidth memory. We show preliminary results on Knights-Landing including a roofline study of code performance before and after a number of optimizations. We find that the GW method is particularly well-suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-wave components, band-pairs, and frequencies.
; ; ; ; ; ; ;
Publication Date:
OSTI Identifier:
Report Number(s):
DOE Contract Number:
Resource Type:
Resource Relation:
Conference: Presented at the ISC High Performance 2016 International Workshops - Application Performance on Intel Xeon Phi - Being Prepared for KNL & Beyond (IXPUG), 19-23 June 2016, Frankfurt, Germany
Cham, Switzerland: Springer International Publishing
Research Org:
NREL (National Renewable Energy Laboratory (NREL), Golden, CO (United States))
Sponsoring Org:
USDOE Office of Energy Efficiency and Renewable Energy (EERE), NREL Laboratory Directed Research and Development (LDRD)
Country of Publication:
United States
97 MATHEMATICS AND COMPUTING BerkeleyGW; optimization; performance