Performance Profile of Transformer Fine-Tuning in Multi-GPU Cloud Environments
- ORNL
The study presented here focuses on performance characteristics and trade-offs associated with running machine-learning tasks in multi-GPU environments on both on-site cloud computing resources and commercial cloud services (Azure). Specifically, this study examines these tradeoffs by examining the performance of training and fine-tuning of transformer-based deep-learning (DL) networks on clinical notes and data, a task of critical importance in the medical domain. To this end, we perform DL-related experiments on the widely deployed NVIDIA V100 GPUs and on the newer A100 GPUs connected via NVLink or PCIe. This study analyzes the execution time of major operations to train DL models and investigate popular options to optimize each of them. We examine and present the findings on the impacts that various operations (e.g. data loading into GPUs, training, fine-tuning), optimizations, and system configurations (single vs. multi-GPU, NVLink vs. PCIe) have on the overall training performance.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1883970
- Country of Publication:
- United States
- Language:
- English
Similar Records
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect
Matrix Product (GEMM) Performance Data from GPUs
Evaluating On-Node GPU Interconnects for Deep Learning Workloads
Journal Article
·
Tue Dec 31 23:00:00 EST 2019
· IEEE Transactions on Parallel and Distributed Systems
·
OSTI ID:1598812
Matrix Product (GEMM) Performance Data from GPUs
Dataset
·
Thu Sep 09 00:00:00 EDT 2021
·
OSTI ID:1819195
Evaluating On-Node GPU Interconnects for Deep Learning Workloads
Conference
·
Sun Dec 31 23:00:00 EST 2017
·
OSTI ID:1525777