Quantifying Uncertainty in HPC Job Queue Time Predictions
High Performance Computing (HPC) has developed at an unprecedented pace in recent decades. This growth has demanded corresponding development in the area of HPC Operational Data Analytics (ODA), which encompasses a wide range of data analysis techniques, ML/AI efforts, tools, and visualizations. Published studies in ODA offer a variety of practical ways to inform HPC users, administrators, procurement managers, and other stakeholders. Uncertainty analysis, however, is rare in the related published literature. For instance, we identify only 1 out of 14 existing studies focused on job queue time prediction that investigates the uncertainty aspect of their proposed predictions. We recognize the utmost importance uncertainty quantification can have in such predictive analytics solutions, with consequences in how users interpret information they receive, and attempt to bridge this gap. With the goal of improving access to such insights, we develop a process for determining upper and lower bounds of the predicted queue times of a regression model at a specified confidence level. Our current research is focused on the uncertainty in predicting job queue times, yet our approach may be employed in predicting other metrics.
- Research Organization:
- National Renewable Energy Laboratory (NREL), Golden, CO (United States)
- Sponsoring Organization:
- USDOE National Renewable Energy Laboratory (NREL)
- DOE Contract Number:
- AC36-08GO28308
- OSTI ID:
- 2433908
- Report Number(s):
- NREL/CP-2C00-90232; MainId:92010; UUID:66619b78-4886-469f-a4ed-c596ed89b338; MainAdminId:72830
- Country of Publication:
- United States
- Language:
- English
Similar Records
Tandem Predictions for HPC Jobs
A Conceptual Framework for HPC Operational Data Analytics
JobQueue-PG: A Task Queue for Coordinating Varied Tasks Across Multiple HPC Resources and HPC Jobs
Conference
·
Wed Jul 17 00:00:00 EDT 2024
·
OSTI ID:2447811
A Conceptual Framework for HPC Operational Data Analytics
Conference
·
Wed Sep 01 00:00:00 EDT 2021
·
OSTI ID:1820791
JobQueue-PG: A Task Queue for Coordinating Varied Tasks Across Multiple HPC Resources and HPC Jobs
Software
·
Thu May 12 20:00:00 EDT 2022
·
OSTI ID:code-74434