Critical Path-Based Thread Placement for NUMA Systems
Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability. However, NUMA introduces performance penalties due to remote memory accesses. Without efficiently managing data layout and thread mapping to cores, scientific applications, even if they are optimized for NUMA, may suffer performance loss. In this paper, we present algorithms and a runtime system that optimize the execution of OpenMP applications on NUMA architectures. By collecting information from hardware counters, the runtime system directs thread placement and reduces performance penalties by minimizing the critical path of OpenMP parallel regions. The runtime system uses a scalable algorithm that derives placement decisions with negligible overhead. We evaluate our algorithms and runtime system with four NPB applications implemented in OpenMP. On average the algorithms achieve between 8.13% and 25.68% performance improvement compared to the default Linux thread placement scheme. The algorithms miss the optimal thread placement in only 8.9% of the cases.
- Research Organization:
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- W-7405-ENG-48
- OSTI ID:
- 1035298
- Report Number(s):
- LLNL-CONF-510002; TRN: US201204%%530
- Resource Relation:
- Conference: Presented at: 2nd International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems (PMBS11), Seattle, WA, United States, Nov 13 - Nov 13, 2011
- Country of Publication:
- United States
- Language:
- English
Similar Records
Petascale Computing Enabling Technologies Project Final Report
Data and Thread Placement in NUMA Architectures: A Statistical Learning Approach