RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input Preprocessing
- University of California, Santa Barbara
- Amazon
- BATTELLE (PACIFIC NW LAB)
- University of California, San Diego
Ensuring high-quality recommendations for newly onboarded users requires the continuous retraining of Deep Learning Recommendation Models (DLRMs) with freshly generated data. To serve the online DLRM retraining, existing solutions use hundreds of CPU computing nodes designated for input preprocessing, causing significant power consumption that surpasses even the power usage of GPU trainers. To this end, we propose RAP, an end-to-end DLRM training framework that supports Resource-aware Automated GPU sharing for DLRM input Preprocessing and Training. The core idea of RAP is to accurately capture the remaining GPU computing resources during DLRM training for input preprocessing, achieving superior training efficiency without requiring additional resources. Specifically, RAP utilizes a co-running cost model to efficiently assess the costs of various input preprocessing operations, and it implements a resource-aware horizontal fusion technique that adaptively merges smaller kernels according to GPU availability, circumventing any interference with DLRM training. In addition, RAP leverages a heuristic searching algorithm that jointly optimizes both the input preprocessing graph mapping and the co-running schedule to maximize the end-to-end DLRM training throughput. The comprehensive evaluation shows that RAP achieves 78.3× speedup on average over CPU-based DLRM input preprocessing frameworks. In addition, the end-to-end training throughput of RAP is only 2.04% lower than the ideal case, which has no input preprocessing overhead.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 2446788
- Report Number(s):
- PNNL-SA-189479
- Country of Publication:
- United States
- Language:
- English
Similar Records
MARBLE: A Multi-GPU Aware Job Scheduler for Deep Learning on HPC Systems
RACB: Resource Aware Cache Bypass on GPUs
Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training
Conference
·
Fri May 01 00:00:00 EDT 2020
·
OSTI ID:1649080
RACB: Resource Aware Cache Bypass on GPUs
Conference
·
Wed Oct 01 00:00:00 EDT 2014
· 2014 International Symposium on Computer Architecture and High Performance Computing Workshop; 22-24 Oct. 2014; Paris, France
·
OSTI ID:1567596
Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training
Conference
·
Fri Jun 23 00:00:00 EDT 2023
·
OSTI ID:1988131