A Locality-Based Threading Algorithm for the Configuration-Interaction Method
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division
- San Diego State Univ., San Diego, CA (United States). Dept. of Physics
- Univ. of California, Berkeley, CA (United States). Dept. of Physics
The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. Here in this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intel Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.
- Research Organization:
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- Grant/Contract Number:
- AC02-05CH11231
- OSTI ID:
- 1393243
- Journal Information:
- IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, Vol. 2017; Conference: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Lake Buena Vista, FL (United States), 29 May-2 Jun 2017; ISSN 2164-7062
- Publisher:
- IEEECopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Performance and Energy Usage of Workloads on KNL and Haswell Architectures. In: High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation
MILC staggered conjugate gradient performance on Intel KNL