Scalable Deep Learning-Based Microarchitecture Simulation on GPUs
Cycle-accurate microarchitecture simulators are essential tools for designers to architect, estimate, optimize, and manufacture new processors that meet specific design expectations. However, conventional simulators based on discrete-event methods often require an exceedingly long time-to-solution for the simulation of applications and architectures at full complexity and scale. Given the excitement around wielding the machine learning (ML) hammer to tackle various architecture problems, there have been attempts to employ ML to perform architecture simulations, such as Ithemal and SimNet. However, the direct application of existing ML approaches to architecture simulation may be even slower due to overwhelming memory traffic and stringent sequential computation logic. This work proposes the first graphics processing unit (GPU)-based microarchitecture simulator that fully unleashes the potential of GPUs to accelerate state-of-the-art ML-based simulators. First, considering the application traces are loaded from central processing unit (CPU) to GPU for simulation, we introduce various designs to reduce the data movement cost between CPUs and GPUs. Second, we propose a parallel simulation paradigm that partitions the application trace into sub-traces to simulate them in parallel with rigorous error analysis and effective error correction mechanisms. Combined, this scalable GPU-based simulator outperforms by orders of magnitude the traditional CPU-based simulators and the state-of-the-art ML-based simulators, i.e., SimNet and Ithemal.
- Research Organization:
- Brookhaven National Laboratory (BNL), Upton, NY (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (SC-21); ~OTHER
- DOE Contract Number:
- SC0012704
- OSTI ID:
- 1989626
- Report Number(s):
- BNL-224538-2023-COPA
- Resource Relation:
- Conference: SC '22: The International Conference on High Performance Computing, Networking, Storage and Analysis, Dallas, TX, 11/13/2022 - 11/18/2022
- Country of Publication:
- United States
- Language:
- English
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
|
conference | January 2009 |
NUAT: A non-uniform access time memory controller
|
conference | February 2014 |
Accelerating architectural simulation by parallel execution of trace samples
|
conference | January 1994 |
Evidence-based static branch prediction using machine learning
|
journal | January 1997 |
Using SimPoint for accurate and efficient simulation
|
conference | January 2003 |
The gem5 simulator
|
journal | August 2011 |
SimpleScalar: an infrastructure for computer system modeling
|
journal | January 2002 |
Combining trace sampling with single pass methods for efficient cache simulation
|
journal | June 1998 |
SimNet
|
journal | May 2022 |
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research
|
journal | January 2002 |
GPU Computing
|
journal | May 2008 |
GPU Performance Estimation using Software Rasterization and Machine Learning
|
journal | September 2017 |
ZSim
|
journal | June 2013 |
Predicting GPU Performance from CPU Runs Using Machine Learning
|
conference | October 2014 |
Interval simulation: Raising the level of abstraction in architectural simulation
|
conference | January 2010 |
Accurate phase-level cross-platform power and performance estimation
|
conference | June 2016 |
TurboSMARTS
|
conference | June 2005 |
C-SAW: A Framework for Graph Sampling and Random Walk on GPUs
|
conference | November 2020 |
Methods of inference and learning for performance modeling of parallel applications
|
conference | January 2007 |
Illustrative Design Space Studies with Microarchitectural Regression Models
|
conference | January 2007 |
Trust: Triangle Counting Reloaded on GPUs
|
journal | November 2021 |
Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance
|
conference | January 2015 |
gSoFa: Scalable Sparse Symbolic LU Factorization on GPUs
|
journal | April 2022 |
GPGPU performance and power estimation using machine learning
|
conference | February 2015 |
A Simulator for Large-Scale Parallel Computer Architectures
|
journal | April 2010 |
MARSS: a full system simulator for multicore x86 CPUs
|
conference | January 2011 |
Spec Cpu2017
|
conference | April 2018 |
Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed
|
conference | October 2015 |
Efficiently exploring architectural design spaces via predictive modeling
|
conference | October 2006 |
Similar Records
SimNet: Accurate and High-Performance Computer Architecture Simulation using Deep Learning
Scalable molecular dynamics on CPU and GPU architectures with NAMD