Small File Aggregation with PLFS [Slides]
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Today’s computational science demands have resulted in ever larger parallel computers, and storage systems have grown to match these demands. Parallel file systems used in this environment are increasingly specialized to extract the highest possible performance for large I/O operations, at the expense of other potential workloads. While some applications have adapted to I/O best practices and can obtain good performance on these systems, the natural I/O patterns of many applications result in the generation of a huge number of small files, the creation of which is poorly served by current parallel file systems at very large scale. This paper describes a new technique for optimizing small file access in parallel file systems for these very large scale systems. The idea is to use a virtual parallel log-structure file system on the compute nodes in order to aggregate large numbers of small files in compute node memory and then stream their data sequentially to a much smaller number of physical files on an underlying parallel file system. The technique is implemented and evaluated using PLFS as the aggregating middleware. We evaluate our system with micro-benchmarks on a local OSX filesystem and with an MPI extension of the standard Postmark to provide results at scale on both Lustre and PanFS parallel filesystems. We observe as much as a 33x improvement in small file create rates on a single host, and 30x improvement in small file write rates, compared to a baseline Lustre configuration on a leadership computing platform using 16,384 cores and achieve an unprecedented create rate of 200 million files per second.
- Research Organization:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA)
- DOE Contract Number:
- AC52-06NA25396
- OSTI ID:
- 1070054
- Report Number(s):
- LA-UR--13-22024
- Country of Publication:
- United States
- Language:
- English
Similar Records
...And Eat it Too: High Read Performance in Write-Optimized HPC I/O Middleware File Formats
Reducing Concurrency Bottlenecks in Parallel I/O Workloads
Certification of Completion of Level-2 Milestone 464: Complete Phase 1 Integration of Site-Wide Global Parallel File System (SWGPFS)
Conference
·
Wed Dec 31 23:00:00 EST 2008
·
OSTI ID:982187
Reducing Concurrency Bottlenecks in Parallel I/O Workloads
Conference
·
Fri Dec 31 23:00:00 EST 2010
·
OSTI ID:1012628
Certification of Completion of Level-2 Milestone 464: Complete Phase 1 Integration of Site-Wide Global Parallel File System (SWGPFS)
Technical Report
·
Mon Jan 23 23:00:00 EST 2006
·
OSTI ID:898470