Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems
- Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea
- Lawrence Berkeley National Laboratory, Computational Research Division, Berkeley, CA, USA
- Department of Game Design and Development, Sangmyung University, Seoul, South Korea
Burst Buffer is widely used in supercomputer centers to bridge the performance gap between computational power and the high-performance I/O systems. The primary role of Burst Buffer is to temporarily absorb the bursty I/O and reduce the heavy access on Parallel File System (PFS). However, the job resource manager on High-Performance Computer (HPC) systems prefers to use a dedicated Burst Buffer allocation approach, which eventually leads to the severely underutilized Burst Buffer resource. To improve the efficiency of using the expensive Burst Buffer resource, we analyze the I/O patterns on Burst Buffer in depth. We propose Burst Buffer over-subscription allocation method, which improves Burst Buffer utilization by allowing each job to access Burst Buffer only during its I/O phases so that the jobs can overlap each other. Furthermore, we develop a new I/O congestion-Aware scheduler and a transparent data management system between Burst Buffer and PFS. Our approach also reduces the memory overhead and improves the data persistence of the data management system by adapting the persistent memory. With the proposed approach, not only the Burst Buffer utilization can be improved, but also HPC applications can achieve high I/O performance by exploiting the powerful Burst Buffer hardware capabilities. Experimental results show that BBOS can improve Burst Buffer utilization by up to 120% while more stable and higher checkpoint performance is guaranteed even under high I/O loads compared to other state-of-The-Art schedulers. Besides, our approach can improve the hit ratio of restart requests by up to 96.4% and provides up to 210% higher restart throughput on Burst Buffer.
- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); National Research Foundation of Korea (NRF); Korean Government (MSIT)
- Grant/Contract Number:
- AC02-05CH11231; NRF-2016M3C4A7952587; 4199990214639; NRF-2021R1F1A1063438
- OSTI ID:
- 1908919
- Alternate ID(s):
- OSTI ID: 1908920; OSTI ID: 1983925
- Journal Information:
- IEEE Access, Journal Name: IEEE Access Vol. 11; ISSN 2169-3536
- Publisher:
- Institute of Electrical and Electronics EngineersCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
An empirical study of I/O separation for burst buffers in HPC systems
SPARC: Demonstrate burst-buffer-based checkpoint/restart on ATS-1.