Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Quantifying Uncertainty in HPC Job Queue Time Predictions

Conference ·
High Performance Computing (HPC) has developed at an unprecedented pace in recent decades. This growth has demanded corresponding development in the area of HPC Operational Data Analytics (ODA), which encompasses a wide range of data analysis techniques, ML/AI efforts, tools, and visualizations. Published studies in ODA offer a variety of practical ways to inform HPC users, administrators, procurement managers, and other stakeholders. Uncertainty analysis, however, is rare in the related published literature. For instance, we identify only 1 out of 14 existing studies focused on job queue time prediction that investigates the uncertainty aspect of their proposed predictions. We recognize the utmost importance uncertainty quantification can have in such predictive analytics solutions, with consequences in how users interpret information they receive, and attempt to bridge this gap. With the goal of improving access to such insights, we develop a process for determining upper and lower bounds of the predicted queue times of a regression model at a specified confidence level. Our current research is focused on the uncertainty in predicting job queue times, yet our approach may be employed in predicting other metrics.
Research Organization:
National Renewable Energy Laboratory (NREL), Golden, CO (United States)
Sponsoring Organization:
USDOE National Renewable Energy Laboratory (NREL)
DOE Contract Number:
AC36-08GO28308
OSTI ID:
2433908
Report Number(s):
NREL/CP-2C00-90232; MainId:92010; UUID:66619b78-4886-469f-a4ed-c596ed89b338; MainAdminId:72830
Country of Publication:
United States
Language:
English

References (5)

An Integrated Job Monitor, Analyzer and Predictor conference September 2021
A Machine Learning Approach for an HPC Use Case: the Jobs Queuing Time Prediction journal June 2023
Approbation of Methods for Supercomputer Job Queue Wait Time Estimation journal August 2023
Queue congestion prediction for large-scale high performance computing systems using a hidden Markov model journal February 2022
Queue Waiting Time Prediction for Large-scale High-performance Computing System conference July 2019

Similar Records

Tandem Predictions for HPC Jobs
Conference · Wed Jul 17 00:00:00 EDT 2024 · OSTI ID:2447811

A Conceptual Framework for HPC Operational Data Analytics
Conference · Wed Sep 01 00:00:00 EDT 2021 · OSTI ID:1820791

JobQueue-PG: A Task Queue for Coordinating Varied Tasks Across Multiple HPC Resources and HPC Jobs
Software · Thu May 12 20:00:00 EDT 2022 · OSTI ID:code-74434