Is Knowledge about Running Applications Helping Improve Runtime Prediction of HPC Jobs?
High-performance computing systems rely upon scheduling algorithms to achieve high utilization. These schedulers rely upon user estimates of job resource requirements, such as runtime, to determine optimal scheduling of incoming jobs. These user estimates, however, are prone to error. To mitigate this error, significant research has been directed at providing better estimates of job runtime, usually employing machine learning techniques. These techniques are dependent upon the input features selected. Among the possible features is the primary application used by the job. In a survey of more than 20 papers directed at improving runtime prediction, only four included primary application as an input feature. We focus this investigation specifically on the value of adding primary application as an input feature, and find that it does improve model performance, especially for jobs with longer runtimes, though this improvement varies based on the application used. We recommend further research to determine the cause of this variability as well as an optimal strategy for employing a mixture of models both including and not including primary application as a feature.
- Research Organization:
- National Renewable Energy Laboratory (NREL), Golden, CO (United States)
- Sponsoring Organization:
- USDOE National Renewable Energy Laboratory (NREL)
- DOE Contract Number:
- AC36-08GO28308
- OSTI ID:
- 2242427
- Report Number(s):
- NREL/CP-2C00-88316; MainId:89091; UUID:ea739d46-7ac7-436c-afdd-baad9f085205; MainAdminID:71321
- Resource Relation:
- Conference: Presented at PEARC '23: Practice and Experience in Advanced Research Computing, 23-27 July 2023, Portland, Oregon
- Country of Publication:
- United States
- Language:
- English
OKCM: improving parallel task scheduling in high-performance computing systems using online learning
|
journal | November 2020 |
Ensemble Prediction of Job Resources to Improve System Performance for Slurm-Based HPC Systems
|
conference | July 2021 |
Improving the performance of batch schedulers using online job runtime classification
|
journal | June 2022 |
Similar Records
Mastering HPC Runtime Prediction: From Observing Patterns to a Methodological Approach: Preprint
Mastering HPC Runtime Prediction: From Observing Patterns to a Methodological Approach