skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Machine Learning-enabled Scalable Performance Prediction of Scientific Codes

Journal Article · · ACM Transactions on Modeling and Computer Simulation
DOI:https://doi.org/10.1145/3450264· OSTI ID:1841921

Hardware architectures become increasingly complex as the compute capabilities grow to exascale. Here, we present the Analytical Memory Model with Pipelines (AMMP) of the Performance Prediction Toolkit (PPT). PPT-AMMP takes high-level source code and hardware architecture parameters as input and predicts runtime of that code on the target hardware platform, which is defined in the input parameters. PPT-AMMP transforms the code to an (architecture-independent) intermediate representation, then (i) analyzes the basic block structure of the code, (ii) processes architecture-independent virtual memory access patterns that it uses to build memory reuse distance distribution models for each basic block, and (iii) runs detailed basic-block level simulations to determine hardware pipeline usage. PPT-AMMP uses machine learning and regression techniques to build the prediction models based on small instances of the input code, then integrates into a higher-order discrete-event simulation model of PPT running on Simian PDES engine. We validate PPT-AMMP on four standard computational physics benchmarks and present a use case of hardware parameter sensitivity analysis to identify bottleneck hardware resources on different code inputs. We further extend PPT-AMMP to predict the performance of a scientific application code, namely, the radiation transport mini-app SNAP. To this end, we analyze multi-variate regression models that accurately predict the reuse profiles and the basic block counts. We validate predicted SNAP runtimes against actual measured times.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
89233218CNA000001
OSTI ID:
1841921
Alternate ID(s):
OSTI ID: 1896442
Report Number(s):
LA-UR-20-23203; LA-UR-21-21328
Journal Information:
ACM Transactions on Modeling and Computer Simulation, Vol. 31, Issue 2; ISSN 1049-3301
Publisher:
Association for Computing MachineryCopyright Statement
Country of Publication:
United States
Language:
English

References (37)

Kokkos: Enabling manycore performance portability through polymorphic memory access patterns journal December 2014
Estimating cache misses and locality using stack distances conference January 2003
Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures report September 2009
Accurate and efficient regression modeling for microarchitectural performance and power prediction journal October 2006
Predict the performance of GE with an ACO based machine learning algorithm
  • Chennupati, Gopinath; Azad, R. Muhammad Atif; Ryan, Conor
  • GECCO '14: Genetic and Evolutionary Computation Conference, Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation https://doi.org/10.1145/2598394.2609860
conference July 2014
Legion: Expressing locality and independence with logical regions
  • Bauer, Michael; Treichler, Sean; Slaughter, Elliott
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.71
conference November 2012
COMPASS: A Framework for Automated Performance Modeling and Prediction conference January 2015
Hardware-independent application characterization conference September 2013
An Integrated Interconnection Network Model for Large-Scale Performance Prediction
  • Ahmed, Kishwar; Obaida, Mohammad; Liu, Jason
  • SIGSIM-PADS '16: SIGSIM Principles of Advanced Discrete Simulation, Proceedings of the 2016 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation https://doi.org/10.1145/2901378.2901396
conference May 2016
Predicting whole-program locality through reuse distance analysis journal May 2003
PPT-SASMM: Scalable Analytical Shared Memory Model: Predicting the Performance of Multicore Caches from a Single-Threaded Execution Trace conference March 2021
The gem5 simulator journal August 2011
The Simian concept: Parallel Discrete Event Simulation with interpreted languages and just-in-time compilation conference December 2015
New Performance Modeling Methods for Parallel Data Processing Applications journal July 2019
LogP: towards a realistic model of parallel computation journal July 1993
Miss rate prediction across all program inputs
  • Zhong, Y.; Dropsho, S. G.; Ding, C.
  • 12th International Conference on Parallel Architectures and Compilation Techniques. PACT 2003, Oceans 2002 Conference and Exhibition. Conference Proceedings (Cat. No.02CH37362) https://doi.org/10.1109/PACT.2003.1238004
conference January 2003
LogGP: incorporating long messages into the LogP model---one step closer towards a realistic model for parallel computation
  • Alexandrov, Albert; Ionescu, Mihai F.; Schauser, Klaus E.
  • Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures - SPAA '95 https://doi.org/10.1145/215399.215427
conference January 1995
Durango: Scalable Synthetic Workload Generation for Extreme-Scale Application Performance Modeling and Simulation
  • Carothers, Christopher D.; Meredith, Jeremy S.; Blanco, Mark P.
  • SIGSIM-PADS '17: SIGSIM Principles of Advanced Discrete Simulation, Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation https://doi.org/10.1145/3064911.3064923
conference May 2017
Parallel Application Performance Prediction Using Analysis Based Models and HPC Simulations
  • Obaida, Mohammad Abu; Liu, Jason; Chennupati, Gopinath
  • SIGSIM-PADS '18: SIGSIM Principles of Advanced Discrete Simulation, Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation https://doi.org/10.1145/3200921.3200937
conference May 2018
Cetus: A Source-to-Source Compiler Infrastructure for Multicores journal December 2009
PARDA: A Fast Parallel Reuse Distance Analysis Algorithm
  • Niu, Qingpeng; Dinan, James; Lu, Qingda
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.117
conference May 2012
Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming journal April 2009
Evolving multidimensional transformations for symbolic regression with M3GP journal September 2018
Rose: Compiler Support for Object-Oriented Frameworks journal June 2000
Fast, accurate, and scalable memory modeling of GPGPUs using reuse profiles
  • Arafa, Yehia; Badawy, Abdel-Hameed; Chennupati, Gopinath
  • ICS '20: 2020 International Conference on Supercomputing, Proceedings of the 34th ACM International Conference on Supercomputing https://doi.org/10.1145/3392717.3392761
conference June 2020
The structural simulation toolkit journal March 2011
Reuse-distance-based miss-rate prediction on a per instruction basis conference January 2004
Predicting whole-program locality through reuse distance analysis conference January 2003
Evaluation techniques for storage hierarchies journal January 1970
Inferred Models for Dynamic and Sparse Hardware-Software Spaces conference December 2012
Accelerating multicore reuse distance analysis with sampling and parallelization
  • Schuff, Derek L.; Kulkarni, Milind; Pai, Vijay S.
  • Proceedings of the 19th international conference on Parallel architectures and compilation techniques - PACT '10 https://doi.org/10.1145/1854273.1854286
conference January 2010
Imcsim: Parameterized Performance Prediction for Implicit Monte Carlo Codes conference December 2018
Discrete event performance prediction of speculatively parallel temperature-accelerated dynamics journal October 2016
Program locality analysis using reuse distance journal August 2009
Analytical Processor Performance and Power Modeling using Micro-Architecture Independent Characteristics journal January 2016
MARSS: a full system simulator for multicore x86 CPUs conference January 2011
Palm: easing the burden of analytical performance modeling conference January 2014