Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training
- Boston University
- Meta
- University of Rochester
- Indiana University-Bloomington
- BATTELLE (PACIFIC NW LAB)
Deep Learning Recommendation Models (DLRMs) are critical applications in various domains and have evolved as one of the single largest machine learning applications. Trillions of DLRM parameters exceed the on-chip memory capacity of GPUs. Large-scale multi-node systems are required for distributed DLRM inference and training, which suffer from the all-to-all communication bottleneck, mainly limiting the scalability of ever-growing DLRMs. In recent years, SmartNICs have evolved with coupled computation and communication capabilities providing opportunities for a powerful heterogeneous device in the system. However, there isn't such a distributed system that fully leverages the abundant smartNIC resources that resolve the scalability issue of DLRMs. In this work, we proposed a software-hardware co-design of a heterogeneous smartNIC system that resolves the communication bottleneck of distributed DLRMs, mitigates the memory bandwidth pressure, and improves computation efficiency. We provide a set of smartNIC designs of cache systems (including local cache and remote cache) and smartNIC computation kernels which reduce data movement, relieve memory lookup intensity, and improve the GPU's computation efficiency. In addition, we propose a graph algorithm that improves the data locality of queries within batches which optimizes the overall system performance with higher data reuse. Our evaluation shows that our system achieves 2.1x latency speedup for inference and 1.6x throughput speedup for training.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1988131
- Report Number(s):
- PNNL-SA-181666
- Country of Publication:
- United States
- Language:
- English
Similar Records
OPER: Optimality-Guided Embedding Table Parallelization for Large-scale Recommendation Model
A Framework for Neural Network Inference on FPGA-Centric SmartNICs
RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input Preprocessing
Conference
·
Wed Jul 10 00:00:00 EDT 2024
·
OSTI ID:2439115
A Framework for Neural Network Inference on FPGA-Centric SmartNICs
Conference
·
Fri Sep 30 00:00:00 EDT 2022
·
OSTI ID:1964158
RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input Preprocessing
Conference
·
Sat Apr 27 00:00:00 EDT 2024
·
OSTI ID:2446788