Optimizing Excited-State Electronic-Structure Codes for Intel Knights Landing: A Case Study on the BerkeleyGW Software
We profile and optimize calculations performed with the BerkeleyGW code on the Xeon-Phi architecture. BerkeleyGW depends both on hand-tuned critical kernels as well as on BLAS and FFT libraries. We describe the optimization process and performance improvements achieved. We discuss a layered parallelization strategy to take advantage of vector, thread and node-level parallelism. We discuss locality changes (including the consequence of the lack of L3 cache) and effective use of the on-package high-bandwidth memory. We show preliminary results on Knights-Landing including a roofline study of code performance before and after a number of optimizations. We find that the GW method is particularly well-suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-wave components, band-pairs, and frequencies.
- Research Organization:
- National Renewable Energy Lab. (NREL), Golden, CO (United States)
- Sponsoring Organization:
- USDOE Office of Energy Efficiency and Renewable Energy (EERE), NREL Laboratory Directed Research and Development (LDRD)
- DOE Contract Number:
- AC36-08GO28308
- OSTI ID:
- 1333057
- Report Number(s):
- NREL/CP-5K00-67446
- Resource Relation:
- Conference: Presented at the ISC High Performance 2016 International Workshops - Application Performance on Intel Xeon Phi - Being Prepared for KNL & Beyond (IXPUG), 19-23 June 2016, Frankfurt, Germany
- Country of Publication:
- United States
- Language:
- English
Similar Records
Roofline Analysis in the Intel® Advisor to Deliver Optimized Performance for applications on Intel® Xeon Phi™ Processor
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations