Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Scalable Deep Learning-Based Microarchitecture Simulation on GPUs

Conference ·

Cycle-accurate microarchitecture simulators are essential tools for designers to architect, estimate, optimize, and manufacture new processors that meet specific design expectations. However, conventional simulators based on discrete-event methods often require an exceedingly long time-to-solution for the simulation of applications and architectures at full complexity and scale. Given the excitement around wielding the machine learning (ML) hammer to tackle various architecture problems, there have been attempts to employ ML to perform architecture simulations, such as Ithemal and SimNet. However, the direct application of existing ML approaches to architecture simulation may be even slower due to overwhelming memory traffic and stringent sequential computation logic. This work proposes the first graphics processing unit (GPU)-based microarchitecture simulator that fully unleashes the potential of GPUs to accelerate state-of-the-art ML-based simulators. First, considering the application traces are loaded from central processing unit (CPU) to GPU for simulation, we introduce various designs to reduce the data movement cost between CPUs and GPUs. Second, we propose a parallel simulation paradigm that partitions the application trace into sub-traces to simulate them in parallel with rigorous error analysis and effective error correction mechanisms. Combined, this scalable GPU-based simulator outperforms by orders of magnitude the traditional CPU-based simulators and the state-of-the-art ML-based simulators, i.e., SimNet and Ithemal.

Research Organization:
Brookhaven National Laboratory (BNL), Upton, NY (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (SC-21); ~OTHER
DOE Contract Number:
SC0012704
OSTI ID:
1989626
Report Number(s):
BNL-224538-2023-COPA
Resource Relation:
Conference: SC '22: The International Conference on High Performance Computing, Networking, Storage and Analysis, Dallas, TX, 11/13/2022 - 11/18/2022
Country of Publication:
United States
Language:
English

References (29)

An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness conference January 2009
NUAT: A non-uniform access time memory controller conference February 2014
Accelerating architectural simulation by parallel execution of trace samples conference January 1994
Evidence-based static branch prediction using machine learning journal January 1997
Using SimPoint for accurate and efficient simulation
  • Perelman, Erez; Hamerly, Greg; Van Biesbrouck, Michael
  • Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems - SIGMETRICS '03 https://doi.org/10.1145/781027.781076
conference January 2003
The gem5 simulator journal August 2011
SimpleScalar: an infrastructure for computer system modeling journal January 2002
Combining trace sampling with single pass methods for efficient cache simulation journal June 1998
SimNet
  • Li, Lingda; Pandey, Santosh; Flynn, Thomas
  • Proceedings of the ACM on Measurement and Analysis of Computing Systems, Vol. 6, Issue 2 https://doi.org/10.1145/3530891
journal May 2022
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research journal January 2002
GPU Computing journal May 2008
GPU Performance Estimation using Software Rasterization and Machine Learning journal September 2017
ZSim journal June 2013
Predicting GPU Performance from CPU Runs Using Machine Learning conference October 2014
Interval simulation: Raising the level of abstraction in architectural simulation conference January 2010
Accurate phase-level cross-platform power and performance estimation conference June 2016
TurboSMARTS
  • Wenisch, Thomas F.; Wunderlich, Roland E.; Falsafi, Babak
  • Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems https://doi.org/10.1145/1064212.1064278
conference June 2005
C-SAW: A Framework for Graph Sampling and Random Walk on GPUs conference November 2020
Methods of inference and learning for performance modeling of parallel applications
  • Lee, Benjamin C.; Brooks, David M.; de Supinski, Bronis R.
  • Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '07 https://doi.org/10.1145/1229428.1229479
conference January 2007
Illustrative Design Space Studies with Microarchitectural Regression Models conference January 2007
Trust: Triangle Counting Reloaded on GPUs journal November 2021
Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance conference January 2015
gSoFa: Scalable Sparse Symbolic LU Factorization on GPUs journal April 2022
GPGPU performance and power estimation using machine learning conference February 2015
A Simulator for Large-Scale Parallel Computer Architectures journal April 2010
MARSS: a full system simulator for multicore x86 CPUs conference January 2011
Spec Cpu2017 conference April 2018
Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed conference October 2015
Efficiently exploring architectural design spaces via predictive modeling
  • Ïpek, Engin; McKee, Sally A.; Caruana, Rich
  • Proceedings of the 12th international conference on Architectural support for programming languages and operating systems https://doi.org/10.1145/1168857.1168882
conference October 2006