Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Critical Path-Based Thread Placement for NUMA Systems

Conference ·
Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability. However, NUMA introduces performance penalties due to remote memory accesses. Without efficiently managing data layout and thread mapping to cores, scientific applications, even if they are optimized for NUMA, may suffer performance loss. In this paper, we present algorithms and a runtime system that optimize the execution of OpenMP applications on NUMA architectures. By collecting information from hardware counters, the runtime system directs thread placement and reduces performance penalties by minimizing the critical path of OpenMP parallel regions. The runtime system uses a scalable algorithm that derives placement decisions with negligible overhead. We evaluate our algorithms and runtime system with four NPB applications implemented in OpenMP. On average the algorithms achieve between 8.13% and 25.68% performance improvement compared to the default Linux thread placement scheme. The algorithms miss the optimal thread placement in only 8.9% of the cases.
Research Organization:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA
Sponsoring Organization:
USDOE
DOE Contract Number:
W-7405-ENG-48
OSTI ID:
1035298
Report Number(s):
LLNL-CONF-510002
Country of Publication:
United States
Language:
English

Similar Records

Critical Path-Based Thread Placement for NUMA Systems
Journal Article · Sat Dec 31 23:00:00 EST 2011 · Performance Evaluation Review · OSTI ID:1048161

Data and Thread Placement in NUMA Architectures: A Statistical Learning Approach
Conference · Mon Dec 31 23:00:00 EST 2018 · OSTI ID:1574309

Page placement policies for NUMA multiprocessors
Journal Article · Thu Jan 31 23:00:00 EST 1991 · Journal of Parallel and Distributed Computing; (United States) · OSTI ID:5001639