Decentralized Distributed Proximal Policy Optimization (DD-PPO) for High Performance Computing Scheduling on Multi-User Systems

Sgambati, Matthew R; Anderson, Matthew William; Vakanski, Aleksandar

doi:10.48550/arXiv.2505.03946

Decentralized Distributed Proximal Policy Optimization (DD-PPO) for High Performance Computing Scheduling on Multi-User Systems

Journal Article · Tue May 06 00:00:00 EDT 2025 · The Journal of Supercomputing

DOI:https://doi.org/10.48550/arXiv.2505.03946· OSTI ID:2575548

^[1]; Anderson, Matthew William ^[1]; Vakanski, Aleksandar ^[2]

Idaho National Laboratory
University of Idaho

Resource allocation in High Performance Computing (HPC) environments presents a complex and multifaceted challenge for job scheduling algorithms. Beyond the efficient allocation of system resources, schedulers must account for and optimize multiple performance metrics, including job wait time and system throughput. Traditional heuristic-based scheduling algorithms increasingly struggle and lack the efficiency needed to meet the demands and address the complexity and scale of modern HPC systems. Consequently, recent research efforts have focused on leveraging advancements in Artificial Intelligence (AI) and Deep Learning (DL), particularly Reinforcement Learning (RL), to develop more adaptable and intelligent scheduling strategies. Previous RL-based scheduling approaches have explored a range of algorithms, from Deep Q-Networks (DQN) to Proximal Policy Optimization (PPO), and more recently, hybrid methods that integrate Graph Neural Networks (GNNs) with RL techniques. However, a common limitation across these methods is their reliance on relatively small datasets, with few methods being evaluated using large-scale, multi-million-job trace datasets representative of real-world HPC workloads. Moreover, existing RL schedulers face scalability issues due to centralized policy updates, which hinder training efficiency and performance when applied to large datasets. This study introduces a novel RL-based scheduler utilizing Decentralized Distributed Proximal Policy Optimization (DD-PPO) algorithm, which supports large-scale distributed training across multiple workers without requiring parameter synchronization at every step. By eliminating reliance on centralized updates to a shared policy, the DD-PPO scheduler enhances scalability, training efficiency, and sample utilization. Experimental validation using a large real-world dataset containing over 11.5 million job traces collected from petascale HPC systems over six years assesses the influence of dataset scale on training effectiveness and compares DD-PPO performance to traditional and advanced scheduling approaches. The experimental results demonstrate improved scheduling performance in comparison to both heuristic-based schedulers and existing RL-based scheduling algorithms.

Research Organization:: Idaho National Laboratory (INL), Idaho Falls, ID (United States)

Sponsoring Organization:: USDOE Office of Nuclear Energy (NE)

Grant/Contract Number:: AC07-05ID14517

OSTI ID:: 2575548

Report Number(s):: INL/JOU-25-84366

Journal Information:: The Journal of Supercomputing, Journal Name: The Journal of Supercomputing Journal Issue: 1 Vol. 1

Country of Publication:: United States

Language:: English

Similar Records

PPO And Friends

Software · Thu May 30 20:00:00 EDT 2024 · OSTI ID:code-140520

DRAS: Deep Reinforcement Learning for Cluster Scheduling in High Performance Computing

Journal Article · Thu Sep 15 20:00:00 EDT 2022 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1984484

Related Subjects

AI
HPC
RL
Reinforcement Learning
Scheduling

Decentralized Distributed Proximal Policy Optimization (DD-PPO) for High Performance Computing Scheduling on Multi-User Systems

Citation Formats

Similar Records

Related Subjects