HPC Usage Behavior Analysis and Performance Estimation with Machine Learning Techniques
- ORNL
Most researchers with little high performance computing (HPC) experience have difficulties productively using the supercomputing resources. To address this issue, we investigated usage behaviors of the world s fastest academic Kraken supercomputer, and built a knowledge-based recommendation system to improve user productivity. Six clustering techniques, along with three cluster validation measures, were implemented to investigate the underlying patterns of usage behaviors. Besides manually defining a category for very large job submissions, six behavior categories were identified, which cleanly separated the data intensive jobs and computational intensive jobs. Then, job statistics of each behavior category were used to develop a knowledge-based recommendation system that can provide users with instructions about choosing appropriate software packages, setting job parameter values, and estimating job queuing time and runtime. Experiments were conducted to evaluate the performance of the proposed recommendation system, which included 127 job submissions by users from different research fields. Great feedback indicated the usefulness of the provided information. The average runtime estimation accuracy of 64.2%, with 28.9% job termination rate, was achieved in the experiments, which almost doubled the average accuracy in the Kraken dataset.
- Research Organization:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE; Work for Others (WFO)
- DOE Contract Number:
- DE-AC05-00OR22725
- OSTI ID:
- 1091680
- Resource Relation:
- Conference: 18th International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, NV, USA, 20120716, 20120719
- Country of Publication:
- United States
- Language:
- English
Similar Records
Characteristics of workload on ASCI blue-pacific at lawrence livermore national laboratory
Quantifying the Impact of Advanced Web Platforms on High Performance Computing Usage