$$\mathrm{PPT}$$-Multicore: performance prediction of Open$$\mathrm{MP}$$ applications using reuse profiles and analytical modeling
Journal Article
·
· Journal of Supercomputing
- New Mexico State University, Las Cruces, NM (United States)
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
In this report we present PPT-Multicore, an analytical model embedded in the Performance Prediction Toolkit (PPT) to predict parallel applications’ performance running on a multicore processor. PPT-Multicore builds upon our previous work towards a multicore cache model. We extract LLVM basic block labeled memory trace using an architecture-independent LLVM-based instrumentation tool only once in an application’s lifetime. The model uses the memory trace and other parameters from an instrumented sequentially executed binary. We use probabilistic and computationally efficient reuse profiles to predict the cache hit rates and runtimes of OpenMP programs’ parallel sections. We model Intel’s Broadwell, Haswell, and AMD’s Zen2 architectures and validate our framework using different applications from PolyBench and PARSEC benchmark suites. The results show that PPT-Multicore can predict cache hit rates with an overall average error rate of 1.23% while predicting the runtime with an error rate of 9.08%.
- Research Organization:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA)
- Grant/Contract Number:
- 89233218CNA000001
- OSTI ID:
- 1922761
- Report Number(s):
- LA-UR-21-22749
- Journal Information:
- Journal of Supercomputing, Journal Name: Journal of Supercomputing Journal Issue: 2 Vol. 78; ISSN 0920-8542
- Publisher:
- SpringerCopyright Statement
- Country of Publication:
- United States
- Language:
- English
LogGP: Incorporating Long Messages into the LogP Model for Parallel Computation
|
journal | July 1997 |
Performance Metrics and Models for Shared Cache
|
journal | July 2014 |
Performance modeling of hybrid MPI/OpenMP scientific applications on large-scale multicore supercomputers
|
journal | December 2013 |
Rsim: simulating shared-memory multiprocessors with ILP processors
|
journal | January 2002 |
SimpleScalar: an infrastructure for computer system modeling
|
journal | January 2002 |
OpenMP: an industry standard API for shared-memory programming
|
journal | January 1998 |
Fast and Accurate Exploration of Multi-level Caches Using Hierarchical Reuse Distance
|
conference | February 2017 |
Cache replacement based on reuse-distance prediction
|
conference | October 2007 |
Hardware-independent application characterization
|
conference | September 2013 |
GPUs Cache Performance Estimation using Reuse Distance Analysis
|
conference | October 2019 |
PARDA: A Fast Parallel Reuse Distance Analysis Algorithm
|
conference | May 2012 |
Multicore-aware reuse distance analysis
|
conference | April 2010 |
Auto-tuning a high-level language targeted to GPU codes
|
conference | May 2012 |
Guiding Locality Optimizations for Graph Computations via Reuse Distance Analysis
|
journal | July 2017 |
Modeling Superscalar Processor Memory-Level Parallelism
|
journal | January 2018 |
RPPM: Rapid Performance Prediction of Multithreaded Applications on Multicore Hardware
|
journal | July 2018 |
PPT-GPU: Scalable GPU Performance Modeling
|
journal | January 2019 |
Barra: A Parallel Functional Simulator for GPGPU
|
conference | August 2010 |
Amdahl's Law in the Multicore Era
|
journal | July 2008 |
Microarchitectural Design Space Exploration Using an Architecture-Centric Approach
|
conference | December 2007 |
CPR: Composable performance regression for scalable multiprocessor models
|
conference | November 2008 |
Improving Cache Management Policies Using Dynamic Reuse Distances
|
conference | December 2012 |
Maximizing CMP throughput with mediocre cores
|
conference | January 2005 |
Optimizing locality in graph computations using reuse distance profiles
|
conference | December 2017 |
Benchmarking Machine Learning Methods for Performance Modeling of Scientific Applications
|
conference | November 2018 |
Aspen: A domain specific language for performance modeling
|
conference | November 2012 |
MUSA: A Multi-level Simulation Approach for Next-Generation HPC Machines
|
conference | November 2016 |
Miss Rate Prediction Across Program Inputs and Cache Configurations
|
journal | March 2007 |
Modeling and Stack Simulation of CMP Cache Capacity and Accessibility
|
journal | December 2009 |
Analytical Derivation of Concurrent Reuse Distance Profile for Multi-Threaded Application Running on Chip Multi-Processor
|
journal | August 2019 |
The Simian concept: Parallel Discrete Event Simulation with interpreted languages and just-in-time compilation
|
conference | December 2015 |
An analytical memory hierarchy model for performance prediction
|
conference | December 2017 |
SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture
|
journal | March 2004 |
Using Pin as a memory reference generator for multiprocessor simulation
|
journal | December 2005 |
Locality approximation using time
|
conference | January 2007 |
Valgrind: a framework for heavyweight dynamic binary instrumentation
|
journal | June 2007 |
Pin
|
conference | January 2004 |
The SimCore/Alpha Functional Simulator
|
conference | January 2004 |
Program locality analysis using reuse distance
|
journal | August 2009 |
Accelerating multicore reuse distance analysis with sampling and parallelization
|
conference | January 2010 |
The structural simulation toolkit
|
journal | March 2011 |
The gem5 simulator
|
journal | August 2011 |
Moguls
|
journal | June 2011 |
MARSS: a full system simulator for multicore x86 CPUs
|
conference | January 2011 |
Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis
|
conference | June 2012 |
Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs
|
journal | February 2013 |
Reuse-based online models for caches
|
conference | June 2013 |
Studying multicore processor scaling via reuse distance analysis
|
conference | June 2013 |
ZSim
|
journal | June 2013 |
An Evaluation of High-Level Mechanistic Core Models
|
journal | August 2014 |
COMPASS: A Framework for Automated Performance Modeling and Prediction
|
conference | January 2015 |
Reuse Distance-Based Probabilistic Cache Replacement
|
journal | October 2015 |
Identifying Power-Efficient Multicore Cache Hierarchies via Reuse Distance Analysis
|
journal | April 2016 |
Durango: Scalable Synthetic Workload Generation for Extreme-Scale Application Performance Modeling and Simulation
|
conference | May 2017 |
Using Multicore Reuse Distance to Study Coherence Directories
|
journal | May 2017 |
Parallel Application Performance Prediction Using Analysis Based Models and HPC Simulations
|
conference | May 2018 |
Scalable Performance Prediction of Codes with Memory Hierarchy and Pipelines
|
conference | May 2019 |
Fast, accurate, and scalable memory modeling of GPGPUs using reuse profiles
|
conference | June 2020 |
PPT-SASMM: Scalable Analytical Shared Memory Model: Predicting the Performance of Multicore Caches from a Single-Threaded Execution Trace
|
conference | March 2021 |
Predicting whole-program locality through reuse distance analysis
|
conference | January 2003 |
Estimating cache misses and locality using stack distances
|
conference | January 2003 |
Evaluation techniques for storage hierarchies
|
journal | January 1970 |
ExaSAT: An exascale co-design tool for performance modeling
|
journal | April 2014 |
Similar Records
PPT-GPU: Scalable GPU Performance Modeling
CMS multicore scheduling strategy
On the Performance of an Algebraic MultigridSolver on Multicore Clusters
Journal Article
·
Mon Dec 31 19:00:00 EST 2018
· IEEE Computer Architecture Letters
·
OSTI ID:1504654
CMS multicore scheduling strategy
Conference
·
Tue Dec 31 23:00:00 EST 2013
· J.Phys.Conf.Ser.
·
OSTI ID:1296584
On the Performance of an Algebraic MultigridSolver on Multicore Clusters
Conference
·
Thu Apr 29 00:00:00 EDT 2010
·
OSTI ID:1012429