skip to main content

DOE PAGESDOE PAGES

Title: A Locality-Based Threading Algorithm for the Configuration-Interaction Method

The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. Here in this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intel Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.
Authors:
 [1] ;  [1] ;  [2] ;  [3]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division
  2. San Diego State Univ., San Diego, CA (United States). Dept. of Physics
  3. Univ. of California, Berkeley, CA (United States). Dept. of Physics
Publication Date:
Grant/Contract Number:
AC02-05CH11231
Type:
Accepted Manuscript
Journal Name:
IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum
Additional Journal Information:
Journal Volume: 2017; Conference: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Lake Buena Vista, FL (United States), 29 May-2 Jun 2017; Journal ID: ISSN 2164-7062
Publisher:
IEEE
Research Org:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Manycore; locality-based threading algorithm; bigstick; configuration-interaction method; knights landing; Ivy bridge; MPI; OpenMP; multithreading; hybrid programming model
OSTI Identifier:
1393243

Shan, Hongzhang, Williams, Samuel, Johnson, Calvin, and McElvain, Kenneth. A Locality-Based Threading Algorithm for the Configuration-Interaction Method. United States: N. p., Web. doi:10.1109/IPDPSW.2017.15.
Shan, Hongzhang, Williams, Samuel, Johnson, Calvin, & McElvain, Kenneth. A Locality-Based Threading Algorithm for the Configuration-Interaction Method. United States. doi:10.1109/IPDPSW.2017.15.
Shan, Hongzhang, Williams, Samuel, Johnson, Calvin, and McElvain, Kenneth. 2017. "A Locality-Based Threading Algorithm for the Configuration-Interaction Method". United States. doi:10.1109/IPDPSW.2017.15. https://www.osti.gov/servlets/purl/1393243.
@article{osti_1393243,
title = {A Locality-Based Threading Algorithm for the Configuration-Interaction Method},
author = {Shan, Hongzhang and Williams, Samuel and Johnson, Calvin and McElvain, Kenneth},
abstractNote = {The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. Here in this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intel Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.},
doi = {10.1109/IPDPSW.2017.15},
journal = {IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum},
number = ,
volume = 2017,
place = {United States},
year = {2017},
month = {7}
}