Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

$$\mathrm{PPT}$$-Multicore: performance prediction of Open$$\mathrm{MP}$$ applications using reuse profiles and analytical modeling

Journal Article · · Journal of Supercomputing
In this report we present PPT-Multicore, an analytical model embedded in the Performance Prediction Toolkit (PPT) to predict parallel applications’ performance running on a multicore processor. PPT-Multicore builds upon our previous work towards a multicore cache model. We extract LLVM basic block labeled memory trace using an architecture-independent LLVM-based instrumentation tool only once in an application’s lifetime. The model uses the memory trace and other parameters from an instrumented sequentially executed binary. We use probabilistic and computationally efficient reuse profiles to predict the cache hit rates and runtimes of OpenMP programs’ parallel sections. We model Intel’s Broadwell, Haswell, and AMD’s Zen2 architectures and validate our framework using different applications from PolyBench and PARSEC benchmark suites. The results show that PPT-Multicore can predict cache hit rates with an overall average error rate of 1.23% while predicting the runtime with an error rate of 9.08%.
Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
89233218CNA000001
OSTI ID:
1922761
Report Number(s):
LA-UR-21-22749
Journal Information:
Journal of Supercomputing, Journal Name: Journal of Supercomputing Journal Issue: 2 Vol. 78; ISSN 0920-8542
Publisher:
SpringerCopyright Statement
Country of Publication:
United States
Language:
English

References (63)

LogGP: Incorporating Long Messages into the LogP Model for Parallel Computation journal July 1997
Performance Metrics and Models for Shared Cache journal July 2014
Performance modeling of hybrid MPI/OpenMP scientific applications on large-scale multicore supercomputers journal December 2013
Rsim: simulating shared-memory multiprocessors with ILP processors journal January 2002
SimpleScalar: an infrastructure for computer system modeling journal January 2002
OpenMP: an industry standard API for shared-memory programming journal January 1998
Fast and Accurate Exploration of Multi-level Caches Using Hierarchical Reuse Distance conference February 2017
Cache replacement based on reuse-distance prediction conference October 2007
Hardware-independent application characterization conference September 2013
GPUs Cache Performance Estimation using Reuse Distance Analysis conference October 2019
PARDA: A Fast Parallel Reuse Distance Analysis Algorithm
  • Niu, Qingpeng; Dinan, James; Lu, Qingda
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.117
conference May 2012
Multicore-aware reuse distance analysis conference April 2010
Auto-tuning a high-level language targeted to GPU codes conference May 2012
Guiding Locality Optimizations for Graph Computations via Reuse Distance Analysis journal July 2017
Modeling Superscalar Processor Memory-Level Parallelism journal January 2018
RPPM: Rapid Performance Prediction of Multithreaded Applications on Multicore Hardware journal July 2018
PPT-GPU: Scalable GPU Performance Modeling journal January 2019
Barra: A Parallel Functional Simulator for GPGPU
  • Collange, Sylvain; Daumas, Marc; Defour, David
  • Simulation of Computer and Telecommunication Systems (MASCOTS), 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems https://doi.org/10.1109/MASCOTS.2010.43
conference August 2010
Amdahl's Law in the Multicore Era journal July 2008
Microarchitectural Design Space Exploration Using an Architecture-Centric Approach conference December 2007
CPR: Composable performance regression for scalable multiprocessor models conference November 2008
Improving Cache Management Policies Using Dynamic Reuse Distances conference December 2012
Maximizing CMP throughput with mediocre cores conference January 2005
Optimizing locality in graph computations using reuse distance profiles conference December 2017
Benchmarking Machine Learning Methods for Performance Modeling of Scientific Applications
  • Malakar, Preeti; Balaprakash, Prasanna; Vishwanath, Venkatram
  • 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) https://doi.org/10.1109/PMBS.2018.8641686
conference November 2018
Aspen: A domain specific language for performance modeling
  • Spafford, Kyle L.; Vetter, Jeffrey S.
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.20
conference November 2012
MUSA: A Multi-level Simulation Approach for Next-Generation HPC Machines
  • Grass, Thomas; Allande, Cesar; Armejach, Adria
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.44
conference November 2016
Miss Rate Prediction Across Program Inputs and Cache Configurations journal March 2007
Modeling and Stack Simulation of CMP Cache Capacity and Accessibility journal December 2009
Analytical Derivation of Concurrent Reuse Distance Profile for Multi-Threaded Application Running on Chip Multi-Processor journal August 2019
The Simian concept: Parallel Discrete Event Simulation with interpreted languages and just-in-time compilation conference December 2015
An analytical memory hierarchy model for performance prediction conference December 2017
SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture journal March 2004
Using Pin as a memory reference generator for multiprocessor simulation journal December 2005
Locality approximation using time conference January 2007
Valgrind: a framework for heavyweight dynamic binary instrumentation journal June 2007
Pin
  • Reddi, Vijay Janapa; Settle, Alex; Connors, Daniel A.
  • Proceedings of the 2004 workshop on Computer architecture education held in conjunction with the 31st International Symposium on Computer Architecture - WCAE '04 https://doi.org/10.1145/1275571.1275600
conference January 2004
The SimCore/Alpha Functional Simulator
  • Kise, Kenji; Katagiri, Takahiro; Honda, Hiroki
  • Proceedings of the 2004 workshop on Computer architecture education held in conjunction with the 31st International Symposium on Computer Architecture - WCAE '04 https://doi.org/10.1145/1275571.1275602
conference January 2004
Program locality analysis using reuse distance journal August 2009
Accelerating multicore reuse distance analysis with sampling and parallelization
  • Schuff, Derek L.; Kulkarni, Milind; Pai, Vijay S.
  • Proceedings of the 19th international conference on Parallel architectures and compilation techniques - PACT '10 https://doi.org/10.1145/1854273.1854286
conference January 2010
The structural simulation toolkit journal March 2011
The gem5 simulator journal August 2011
Moguls journal June 2011
MARSS: a full system simulator for multicore x86 CPUs conference January 2011
Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis conference June 2012
Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs journal February 2013
Reuse-based online models for caches conference June 2013
Studying multicore processor scaling via reuse distance analysis conference June 2013
ZSim journal June 2013
An Evaluation of High-Level Mechanistic Core Models
  • Carlson, Trevor E.; Heirman, Wim; Eyerman, Stijn
  • ACM Transactions on Architecture and Code Optimization, Vol. 11, Issue 3 https://doi.org/10.1145/2629677
journal August 2014
COMPASS: A Framework for Automated Performance Modeling and Prediction conference January 2015
Reuse Distance-Based Probabilistic Cache Replacement
  • Das, Subhasis; Aamodt, Tor M.; Dally, William J.
  • ACM Transactions on Architecture and Code Optimization, Vol. 12, Issue 4 https://doi.org/10.1145/2818374
journal October 2015
Identifying Power-Efficient Multicore Cache Hierarchies via Reuse Distance Analysis journal April 2016
Durango: Scalable Synthetic Workload Generation for Extreme-Scale Application Performance Modeling and Simulation
  • Carothers, Christopher D.; Meredith, Jeremy S.; Blanco, Mark P.
  • SIGSIM-PADS '17: SIGSIM Principles of Advanced Discrete Simulation, Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation https://doi.org/10.1145/3064911.3064923
conference May 2017
Using Multicore Reuse Distance to Study Coherence Directories journal May 2017
Parallel Application Performance Prediction Using Analysis Based Models and HPC Simulations
  • Obaida, Mohammad Abu; Liu, Jason; Chennupati, Gopinath
  • SIGSIM-PADS '18: SIGSIM Principles of Advanced Discrete Simulation, Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation https://doi.org/10.1145/3200921.3200937
conference May 2018
Scalable Performance Prediction of Codes with Memory Hierarchy and Pipelines
  • Chennupati, Gopinath; Santhi, Nandakishore; Eidenbenz, Stephan
  • Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation https://doi.org/10.1145/3316480.3325518
conference May 2019
Fast, accurate, and scalable memory modeling of GPGPUs using reuse profiles
  • Arafa, Yehia; Badawy, Abdel-Hameed; Chennupati, Gopinath
  • ICS '20: 2020 International Conference on Supercomputing, Proceedings of the 34th ACM International Conference on Supercomputing https://doi.org/10.1145/3392717.3392761
conference June 2020
PPT-SASMM: Scalable Analytical Shared Memory Model: Predicting the Performance of Multicore Caches from a Single-Threaded Execution Trace conference March 2021
Predicting whole-program locality through reuse distance analysis conference January 2003
Estimating cache misses and locality using stack distances conference January 2003
Evaluation techniques for storage hierarchies journal January 1970
ExaSAT: An exascale co-design tool for performance modeling journal April 2014

Similar Records

PPT-GPU: Scalable GPU Performance Modeling
Journal Article · Mon Dec 31 19:00:00 EST 2018 · IEEE Computer Architecture Letters · OSTI ID:1504654

CMS multicore scheduling strategy
Conference · Tue Dec 31 23:00:00 EST 2013 · J.Phys.Conf.Ser. · OSTI ID:1296584

On the Performance of an Algebraic MultigridSolver on Multicore Clusters
Conference · Thu Apr 29 00:00:00 EDT 2010 · OSTI ID:1012429