Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Leveraging Pre-Built Catalogs and Object-Level Scheduling to Eliminate I/O Bottlenecks in HPC Environments

Journal Article · · IEEE Access

Modern High-Performance Computing (HPC) environments face mounting challenges due to the shift from large to small file datasets, along with an increasing number of users and parallelized applications. As HPC systems rely on Parallel File Systems (PFS), such as Lustre for data processing, performance bottlenecks stemming from Object Storage Target (OST) contention have become a significant concern. Existing solutions, such as LADS with its object-level scheduling approach, fall short in large-scale HPC environments due to their inability to effectively address metadata I/O bottlenecks and the growing number of I/O processes. This study highlights the pressing need for a comprehensive solution that tackles both OST contention and metadata I/O challenges in diverse HPC workloads. To address these challenges, we propose SwiftLoad, an object-level I/O scheduling framework that leverages a metadata catalog to enhance the performance and efficiency of parallel HPC utilities. The adoption of the metadata catalog mitigates the metadata I/O bottlenecks that commonly occur in HPC utilities, a challenge that is particularly pronounced in object-level I/O scheduling. SwiftLoad addresses OST contention and the uneven distribution of I/O processes across different OSTs through mathematical modeling and incorporates a Loader Configuration Module to regulate the number of I/O processes. Evaluated with two representative utilities—data deduplication profiling and data augmentation—SwiftLoad achieved performance improvements of up to 5.63x and 11.0x, respectively, on a production supercomputer.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC); Korean Government [Ministry of Science and ICT (MSIT)]; Korea Institute of Science and Technology Information
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
2587474
Journal Information:
IEEE Access, Journal Name: IEEE Access Vol. 13; ISSN 2169-3536
Publisher:
Institute of Electrical and Electronics Engineers (IEEE)Copyright Statement
Country of Publication:
United States
Language:
English

References (10)

Middleware support for many-task computing journal April 2010
Hvac: Removing I/O Bottleneck for Large-Scale Deep Learning Applications conference September 2022
Optimizing a hybrid SSD/HDD HPC storage system based on file size distributions conference May 2013
P-Dedupe: Exploiting Parallelism in Data Deduplication System conference June 2012
A Bloom Filter Based Scalable Data Integrity Check Tool for Large-Scale Dataset conference November 2016
A study on data deduplication in HPC storage systems
  • Meister, Dirk; Kaiser, Jurgen; Brinkmann, Andre
  • 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.14
conference November 2012
LABIOS: A Distributed Label-Based I/O System
  • Kougkas, Anthony; Devarajan, Hariharan; Lofstead, Jay
  • Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '19 https://doi.org/10.1145/3307681.3325405
conference January 2019
Xfast: Extreme File Attribute Stat Acceleration for Lustre conference November 2023
Log-Less Metadata Management on Metadata Server for Parallel File Systems journal January 2014
Distributed and Scalable Directory Service in a Parallel File System journal January 2016

Similar Records

Characterizing output bottlenecks in a supercomputer
Conference · Sat Dec 31 23:00:00 EST 2011 · OSTI ID:1063838

Characterizing output bottlenecks in a supercomputer
Conference · Sat Dec 31 23:00:00 EST 2011 · OSTI ID:1096349

I/O load balancing for big data HPC applications
Conference · Sun Dec 31 23:00:00 EST 2017 · OSTI ID:1415911