Data and Thread Placement in NUMA Architectures: A Statistical Learning Approach
Nowadays, NUMA architectures are common in compute-intensive systems. Achieving high performance for multi-threaded application requires both a careful placement of threads on computing units and a thorough allocation of data in memory. Finding such a placement is a hard problem to solve, because performance depends on complex interactions in several layers of the memory hierarchy. In this paper we propose a black-box approach to decide if an application execution time can be impacted by the placement of its threads and data, and in such a case, to choose the best placement strategy to adopt. We show that it is possible to reach near-optimal placement policy selection. Furthermore, solutions work across several recent processor architectures and decisions can be taken with a single run of low overhead profiling.
- Research Organization:
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Sponsoring Organization:
- Institut National de Recherche en Informatique et en Automatique (INRIA), Bordeaux Sud-Ouest; Centre National de la Recherche Scientifique (CNRS)
- DOE Contract Number:
- AC02-06CH11357
- OSTI ID:
- 1574309
- Resource Relation:
- Conference: 48th International Conference on Parallel Processing, 08/05/19 - 08/08/19, Kyoto, JP
- Country of Publication:
- United States
- Language:
- English
The Nas Parallel Benchmarks
|
journal | September 1991 |
The PARSEC benchmark suite: characterization and architectural implications
|
conference | January 2008 |
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications
|
conference | February 2010 |
A machine learning-based approach for thread mapping on transactional memory applications
|
conference | December 2011 |
Traffic management: a holistic approach to memory placement on NUMA systems
|
conference | January 2013 |
Affinity-Based Thread and Data Mapping in Shared Memory Systems
|
journal | December 2016 |
Hardware profile-guided automatic page placement for ccNUMA systems
|
conference | January 2006 |
Mapping parallelism to multi-cores: a machine learning based approach
|
conference | January 2008 |
Addressing shared resource contention in multicore processors via scheduling
|
conference | January 2010 |
Similar Records
Critical Path-Based Thread Placement for NUMA Systems
Page placement policies for NUMA multiprocessors