Data and Thread Placement in NUMA Architectures: A Statistical Learning Approach

Denoyelle, Nicolas; Goglin, Brice; Jeannot, Emmanuel; Ropars, Thomas

doi:10.1145/3337821.3337893

Title: Data and Thread Placement in NUMA Architectures: A Statistical Learning Approach

Conference · Tue Jan 01 00:00:00 EST 2019

DOI:https://doi.org/10.1145/3337821.3337893· OSTI ID:1574309

Denoyelle, Nicolas; Goglin, Brice; Jeannot, Emmanuel; Ropars, Thomas

Nowadays, NUMA architectures are common in compute-intensive systems. Achieving high performance for multi-threaded application requires both a careful placement of threads on computing units and a thorough allocation of data in memory. Finding such a placement is a hard problem to solve, because performance depends on complex interactions in several layers of the memory hierarchy. In this paper we propose a black-box approach to decide if an application execution time can be impacted by the placement of its threads and data, and in such a case, to choose the best placement strategy to adopt. We show that it is possible to reach near-optimal placement policy selection. Furthermore, solutions work across several recent processor architectures and decisions can be taken with a single run of low overhead profiling.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Argonne National Lab. (ANL), Argonne, IL (United States)

Sponsoring Organization:: Institut National de Recherche en Informatique et en Automatique (INRIA), Bordeaux Sud-Ouest; Centre National de la Recherche Scientifique (CNRS)

DOE Contract Number:: AC02-06CH11357

OSTI ID:: 1574309

Resource Relation:: Conference: 48th International Conference on Parallel Processing, 08/05/19 - 08/08/19, Kyoto, JP

Country of Publication:: United States

Language:: English

References (9)

The Nas Parallel Benchmarks Bailey, D. H.; Barszcz, E.; Barton, J. T. The International Journal of Supercomputing Applications, Vol. 5, Issue 3 https://doi.org/10.1177/109434209100500306	journal	September 1991
The PARSEC benchmark suite: characterization and architectural implications Bienia, Christian; Kumar, Sanjeev; Singh, Jaswinder Pal Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08 https://doi.org/10.1145/1454115.1454128	conference	January 2008
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications Broquedis, Franois; Clet-Ortega, Jerome; Moreaud, Stephanie 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2010), 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing https://doi.org/10.1109/PDP.2010.67	conference	February 2010
A machine learning-based approach for thread mapping on transactional memory applications Castro, Marcio; Goes, Luis Fabricio Wanderley; Ribeiro, Christiane Pousa 2011 18th International Conference on High Performance Computing (HiPC) https://doi.org/10.1109/HiPC.2011.6152736	conference	December 2011
Traffic management: a holistic approach to memory placement on NUMA systems Dashti, Mohammad; Fedorova, Alexandra; Funston, Justin Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems - ASPLOS '13 https://doi.org/10.1145/2451116.2451157	conference	January 2013
Affinity-Based Thread and Data Mapping in Shared Memory Systems Diener, Matthias; Cruz, Eduardo H. M.; Alves, Marco A. Z. ACM Computing Surveys, Vol. 49, Issue 4 https://doi.org/10.1145/3006385	journal	December 2016
Hardware profile-guided automatic page placement for ccNUMA systems Marathe, Jaydeep; Mueller, Frank Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '06 https://doi.org/10.1145/1122971.1122987	conference	January 2006
Mapping parallelism to multi-cores: a machine learning based approach Wang, Zheng; O'Boyle, Micheal F. P. Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '09 https://doi.org/10.1145/1504176.1504189	conference	January 2008
Addressing shared resource contention in multicore processors via scheduling Zhuravlev, Sergey; Blagodurov, Sergey; Fedorova, Alexandra Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems - ASPLOS '10 https://doi.org/10.1145/1736020.1736036	conference	January 2010

Similar Records

Critical Path-Based Thread Placement for NUMA Systems

Conference · Tue Nov 01 00:00:00 EDT 2011 · OSTI ID:1574309

Su, C Y; Li, D; Nikolopoulos, D S; +3 more

Critical Path-Based Thread Placement for NUMA Systems

Journal Article · Sun Jan 01 00:00:00 EST 2012 · Performance Evaluation Review · OSTI ID:1574309

Su, Chun-Yi; Li, Dong; Nikolopoulos, Dimitrios; +3 more

Page placement policies for NUMA multiprocessors

Journal Article · Fri Feb 01 00:00:00 EST 1991 · Journal of Parallel and Distributed Computing; (United States) · OSTI ID:1574309

LaRowe, Jr, R P; Ellis, C S

Related Subjects

data
high-performance-computing
machine-learning
multicore-processors
numa
placement
threads

Title: Data and Thread Placement in NUMA Architectures: A Statistical Learning Approach

Citation Formats

References (9)

Similar Records

Related Subjects