Leveraging Pre-Built Catalogs and Object-Level Scheduling to Eliminate I/O Bottlenecks in HPC Environments
- Sogang University, Seoul (South Korea)
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- University of Massachusetts Lowell, Amherst, MA (United States)
- Korea Institute of Science and Technology Information (KISTI), Daejeon (South Korea)
Modern High-Performance Computing (HPC) environments face mounting challenges due to the shift from large to small file datasets, along with an increasing number of users and parallelized applications. As HPC systems rely on Parallel File Systems (PFS), such as Lustre for data processing, performance bottlenecks stemming from Object Storage Target (OST) contention have become a significant concern. Existing solutions, such as LADS with its object-level scheduling approach, fall short in large-scale HPC environments due to their inability to effectively address metadata I/O bottlenecks and the growing number of I/O processes. This study highlights the pressing need for a comprehensive solution that tackles both OST contention and metadata I/O challenges in diverse HPC workloads. To address these challenges, we propose SwiftLoad, an object-level I/O scheduling framework that leverages a metadata catalog to enhance the performance and efficiency of parallel HPC utilities. The adoption of the metadata catalog mitigates the metadata I/O bottlenecks that commonly occur in HPC utilities, a challenge that is particularly pronounced in object-level I/O scheduling. SwiftLoad addresses OST contention and the uneven distribution of I/O processes across different OSTs through mathematical modeling and incorporates a Loader Configuration Module to regulate the number of I/O processes. Evaluated with two representative utilities—data deduplication profiling and data augmentation—SwiftLoad achieved performance improvements of up to 5.63x and 11.0x, respectively, on a production supercomputer.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC); Korean Government [Ministry of Science and ICT (MSIT)]; Korea Institute of Science and Technology Information
- Grant/Contract Number:
- AC05-00OR22725
- OSTI ID:
- 2587474
- Journal Information:
- IEEE Access, Journal Name: IEEE Access Vol. 13; ISSN 2169-3536
- Publisher:
- Institute of Electrical and Electronics Engineers (IEEE)Copyright Statement
- Country of Publication:
- United States
- Language:
- English
Middleware support for many-task computing
|
journal | April 2010 |
Hvac: Removing I/O Bottleneck for Large-Scale Deep Learning Applications
|
conference | September 2022 |
Optimizing a hybrid SSD/HDD HPC storage system based on file size distributions
|
conference | May 2013 |
P-Dedupe: Exploiting Parallelism in Data Deduplication System
|
conference | June 2012 |
A Bloom Filter Based Scalable Data Integrity Check Tool for Large-Scale Dataset
|
conference | November 2016 |
A study on data deduplication in HPC storage systems
|
conference | November 2012 |
LABIOS: A Distributed Label-Based I/O System
|
conference | January 2019 |
Xfast: Extreme File Attribute Stat Acceleration for Lustre
|
conference | November 2023 |
Log-Less Metadata Management on Metadata Server for Parallel File Systems
|
journal | January 2014 |
Distributed and Scalable Directory Service in a Parallel File System
|
journal | January 2016 |
Similar Records
Characterizing output bottlenecks in a supercomputer
I/O load balancing for big data HPC applications