Decentralized Distributed Proximal Policy Optimization (DD-PPO) for High Performance Computing Scheduling on Multi-User Systems
Journal Article
·
· The Journal of Supercomputing
- Idaho National Laboratory
- University of Idaho
Resource allocation in High Performance Computing (HPC) environments presents a complex and multifaceted challenge for job scheduling algorithms. Beyond the efficient allocation of system resources, schedulers must account for and optimize multiple performance metrics, including job wait time and system throughput. Traditional heuristic-based scheduling algorithms increasingly struggle and lack the efficiency needed to meet the demands and address the complexity and scale of modern HPC systems. Consequently, recent research efforts have focused on leveraging advancements in Artificial Intelligence (AI) and Deep Learning (DL), particularly Reinforcement Learning (RL), to develop more adaptable and intelligent scheduling strategies. Previous RL-based scheduling approaches have explored a range of algorithms, from Deep Q-Networks (DQN) to Proximal Policy Optimization (PPO), and more recently, hybrid methods that integrate Graph Neural Networks (GNNs) with RL techniques. However, a common limitation across these methods is their reliance on relatively small datasets, with few methods being evaluated using large-scale, multi-million-job trace datasets representative of real-world HPC workloads. Moreover, existing RL schedulers face scalability issues due to centralized policy updates, which hinder training efficiency and performance when applied to large datasets. This study introduces a novel RL-based scheduler utilizing Decentralized Distributed Proximal Policy Optimization (DD-PPO) algorithm, which supports large-scale distributed training across multiple workers without requiring parameter synchronization at every step. By eliminating reliance on centralized updates to a shared policy, the DD-PPO scheduler enhances scalability, training efficiency, and sample utilization. Experimental validation using a large real-world dataset containing over 11.5 million job traces collected from petascale HPC systems over six years assesses the influence of dataset scale on training effectiveness and compares DD-PPO performance to traditional and advanced scheduling approaches. The experimental results demonstrate improved scheduling performance in comparison to both heuristic-based schedulers and existing RL-based scheduling algorithms.
- Research Organization:
- Idaho National Laboratory (INL), Idaho Falls, ID (United States)
- Sponsoring Organization:
- USDOE Office of Nuclear Energy (NE)
- Grant/Contract Number:
- AC07-05ID14517
- OSTI ID:
- 2575548
- Report Number(s):
- INL/JOU-25-84366
- Journal Information:
- The Journal of Supercomputing, Journal Name: The Journal of Supercomputing Journal Issue: 1 Vol. 1
- Country of Publication:
- United States
- Language:
- English
Similar Records
PPO And Friends
DRAS: Deep Reinforcement Learning for Cluster Scheduling in High Performance Computing
Software
·
Thu May 30 20:00:00 EDT 2024
·
OSTI ID:code-140520
DRAS: Deep Reinforcement Learning for Cluster Scheduling in High Performance Computing
Journal Article
·
Thu Sep 15 20:00:00 EDT 2022
· IEEE Transactions on Parallel and Distributed Systems
·
OSTI ID:1984484