Learning from Five-year Resource-Utilization Data of Titan System
- ORNL
Titan was the flagship supercomputer at the Oak Ridge Leadership Computing Facility (OLCF). It was deployed in late 2012, became the fastest supercomputer in the world and was retired on August 2, 2019. With Titan's mission complete, this paper provides a first-order examination of the usage of its critical resources (CPU, Memory, GPU, and I/O) over a five-year production period (2015-2019). In particular, we show quantitatively that the majority of CPU time was spent on the large-scale jobs, which is consistent with the policy of driving ground-breaking science through leadership computing. We also corroborate the general observation of the low CPU-memory usage with 95% jobs utilizing only 15% or less available memory. Additionally, we correlate the increase of total job submissions and the decrease of GPU-enabled jobs during 2016 with the GPU reliability issue which impacted the large-scale runs. We further show the surprising read/write ratio over the five-year period, which contradicts the general mindset of the large-scale simulation machines being “write-heavy”. This understanding will have potential impact on how we design our next-generation large-scale storage systems. We believe that our analyses and findings are going to be of great interest to the high-performance computing (HPC) community at large.
- Research Organization:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1648993
- Resource Relation:
- Conference: 2019 IEEE International Conference on Cluster Computing (CLUSTER) - Albuquerque, New Mexico, United States of America - 9/23/2019 8:00:00 AM-9/26/2019 4:00:00 AM
- Country of Publication:
- United States
- Language:
- English
Similar Records
SMC 2021 : Analyzing Resource Utilization and User Behavior on Titan Supercomputer
SMC 2021 Data Challenge: Analyzing Resource Utilization and User Behavior on Titan Supercomputer