skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Indicator-directed Dynamic Power Management for Iterative Workloads on GPU-Accelerated Systems

Conference ·

Modern high-performance and warehouse computing centers show strong interest in minimizing system power consumption while satisfying customers’ quality of service (QoS). Dynamic voltage and frequency scaling (DVFS) is effective for achieving this goal. Nevertheless, automating the process online and making it transparent to users must address three major challenges: (1) Complexity — today’s hardware components (e.g., CPUs, GPUs, memory, network, etc.) can be configured in several or dozens of frequency/voltage states for satisfying divergent system demands. Given their combination and the emergence of heterogeneity, searching the optimal configuration in the design space online can be timing consuming. (2) QoS guarantee — user-defined objectives such as power constraint and performance target must be monitored, predicted and ensured at the best effort. (3) Adaptability — various known and unknown workloads run on systems. Workloads characteristics should be quickly determined and configurations dynamically adjusted in accord with workloads and QoS. In this work, we focus on applications exhibiting an interesting feature – iterative or periodic, which is common among conventional HPC and emerging machine learning workloads. We propose an online dynamic power-performance (ODPP) management framework to dynamically adjust GPU DVFS configurations to meet performance and power objectives and constraints, without any code annotation or intrusion. Particularly, ODPP extracts the performance and power indicators for applications from their resources utilization profiles in a short episode. It further automatically constructs an accurate model that infers from the indicators how the application's performance and power vary with GPU core and memory frequencies. Aided with the model, for both seen and unseen applications, ODPP can quickly determine the most appropriate DVFS configuration for their execution. We evaluate ODPP on an NVIDIA GPU using multiple exascale computing (ECP) and deep learning applications.

Research Organization:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
1661894
Report Number(s):
PNNL-SA-148280
Resource Relation:
Conference: The 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid 2020), May 11-144, 2020, Melbourne, Australia
Country of Publication:
United States
Language:
English

Similar Records

Scaling deep learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
Conference · Thu Aug 24 00:00:00 EDT 2017 · OSTI ID:1661894

Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
Journal Article · Sat May 05 00:00:00 EDT 2018 · Future Generations Computer Systems · OSTI ID:1661894

XUnified: A Framework for Guiding Optimal Use of GPU Unified Memory
Journal Article · Sat Jan 01 00:00:00 EST 2022 · IEEE Access · OSTI ID:1661894

Related Subjects