skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: HPC Usage Behavior Analysis and Performance Estimation with Machine Learning Techniques

Conference ·
OSTI ID:1091680

Most researchers with little high performance computing (HPC) experience have difficulties productively using the supercomputing resources. To address this issue, we investigated usage behaviors of the world s fastest academic Kraken supercomputer, and built a knowledge-based recommendation system to improve user productivity. Six clustering techniques, along with three cluster validation measures, were implemented to investigate the underlying patterns of usage behaviors. Besides manually defining a category for very large job submissions, six behavior categories were identified, which cleanly separated the data intensive jobs and computational intensive jobs. Then, job statistics of each behavior category were used to develop a knowledge-based recommendation system that can provide users with instructions about choosing appropriate software packages, setting job parameter values, and estimating job queuing time and runtime. Experiments were conducted to evaluate the performance of the proposed recommendation system, which included 127 job submissions by users from different research fields. Great feedback indicated the usefulness of the provided information. The average runtime estimation accuracy of 64.2%, with 28.9% job termination rate, was achieved in the experiments, which almost doubled the average accuracy in the Kraken dataset.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE; Work for Others (WFO)
DOE Contract Number:
DE-AC05-00OR22725
OSTI ID:
1091680
Resource Relation:
Conference: 18th International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, NV, USA, 20120716, 20120719
Country of Publication:
United States
Language:
English