Indicator-directed Dynamic Power Management for Iterative Workloads on GPU-Accelerated Systems

Zou, Pengfei; Li, Ang; Barker, Kevin J.; Ge, Rong

doi:10.1109/CCGrid49817.2020.00-37

Title: Indicator-directed Dynamic Power Management for Iterative Workloads on GPU-Accelerated Systems

Conference · Mon May 11 00:00:00 EDT 2020

DOI:https://doi.org/10.1109/CCGrid49817.2020.00-37· OSTI ID:1661894

Zou, Pengfei ^[1]; Li, Ang ^[2]; Barker, Kevin J. ^[2]; Ge, Rong ^[1]

Clemson University
BATTELLE (PACIFIC NW LAB)

Modern high-performance and warehouse computing centers show strong interest in minimizing system power consumption while satisfying customers’ quality of service (QoS). Dynamic voltage and frequency scaling (DVFS) is effective for achieving this goal. Nevertheless, automating the process online and making it transparent to users must address three major challenges: (1) Complexity — today’s hardware components (e.g., CPUs, GPUs, memory, network, etc.) can be configured in several or dozens of frequency/voltage states for satisfying divergent system demands. Given their combination and the emergence of heterogeneity, searching the optimal configuration in the design space online can be timing consuming. (2) QoS guarantee — user-defined objectives such as power constraint and performance target must be monitored, predicted and ensured at the best effort. (3) Adaptability — various known and unknown workloads run on systems. Workloads characteristics should be quickly determined and configurations dynamically adjusted in accord with workloads and QoS. In this work, we focus on applications exhibiting an interesting feature – iterative or periodic, which is common among conventional HPC and emerging machine learning workloads. We propose an online dynamic power-performance (ODPP) management framework to dynamically adjust GPU DVFS configurations to meet performance and power objectives and constraints, without any code annotation or intrusion. Particularly, ODPP extracts the performance and power indicators for applications from their resources utilization profiles in a short episode. It further automatically constructs an accurate model that infers from the indicators how the application's performance and power vary with GPU core and memory frequencies. Aided with the model, for both seen and unseen applications, ODPP can quickly determine the most appropriate DVFS configuration for their execution. We evaluate ODPP on an NVIDIA GPU using multiple exascale computing (ECP) and deep learning applications.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-76RL01830

OSTI ID:: 1661894

Report Number(s):: PNNL-SA-148280

Resource Relation:: Conference: The 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid 2020), May 11-144, 2020, Melbourne, Australia

Country of Publication:: United States

Language:: English

Similar Records

Scaling deep learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Conference · Thu Aug 24 00:00:00 EDT 2017 · OSTI ID:1661894

Gawande, Nitin A.; Landwehr, Joshua B.; Daily, Jeffrey A.; +3 more

Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Journal Article · Sat May 05 00:00:00 EDT 2018 · Future Generations Computer Systems · OSTI ID:1661894

Gawande, Nitin A.; Daily, Jeff A.; Siegel, Charles; +2 more

XUnified: A Framework for Guiding Optimal Use of GPU Unified Memory

Journal Article · Sat Jan 01 00:00:00 EST 2022 · IEEE Access · OSTI ID:1661894

Xu, Hailu; Lin, Pei-Hung; Emani, Murali; +2 more

Title: Indicator-directed Dynamic Power Management for Iterative Workloads on GPU-Accelerated Systems

Citation Formats

Similar Records

Related Subjects