DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Ad Hoc File Systems for High-Performance Computing

Journal Article · · Journal of Computer Science and Technology
 [1];  [2];  [3];  [4];  [5];  [6];  [7];  [8];  [4];  [1]
  1. Johannes Gutenberg Univ., Mainz (Germany)
  2. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
  3. Florida State Univ., Tallahassee, FL (United States)
  4. Argonne National Lab. (ANL), Argonne, IL (United States)
  5. Univ. Politecnica de Catalunya, Barcelona (Spain)
  6. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  7. Barcelona Supercomputing Center, Barcelona (Spain)
  8. Fraunhofer Inst. for Industrial Mathematics ITWM, Kaiserslautern (Germany)

Storage backends of parallel compute clusters are still based mostly on magnetic disks, while newer and faster storage technologies such as flash-based SSDs or non-volatile random access memory (NVRAM) are deployed within compute nodes. Including these new storage technologies into scientific workflows is unfortunately today a mostly manual task, and most scientists therefore do not take advantage of the faster storage media. One approach to systematically include nodelocal SSDs or NVRAMs into scientific workflows is to deploy ad hoc file systems over a set of compute nodes, which serve as temporary storage systems for single applications or longer-running campaigns. This paper presents results from the Dagstuhl Seminar 17202 “Challenges and Opportunities of User-Level File Systems for HPC” and discusses application scenarios as well as design strategies for ad hoc file systems using node-local storage media. The discussion includes open research questions, such as how to couple ad hoc file systems with the batch scheduling environment and how to schedule stage-in and stage-out processes of data between the storage backend and the ad hoc file systems. Also presented are strategies to build ad hoc file systems by using reusable components for networking and how to improve storage device compatibility. Various interfaces and semantics are presented, for example those used by the three ad hoc file systems BeeOND, GekkoFS, and BurstFS. Their presentation covers a range from file systems running in production to cutting-edge research focusing on reaching the performance limits of the underlying devices.

Research Organization:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21); German Research Foundation (DFG); European Union (EU); Spanish Ministry of Science and Innovation (MICINN); National Science Foundation (NSF)
Grant/Contract Number:
AC02-06CH11357
OSTI ID:
1596689
Journal Information:
Journal of Computer Science and Technology, Journal Name: Journal of Computer Science and Technology Journal Issue: 1 Vol. 35; ISSN 1000-9000
Publisher:
Springer NatureCopyright Statement
Country of Publication:
United States
Language:
English

References (62)

ROOT — A C++ framework for petabyte data storage, statistical analysis and visualization journal June 2011
‘Big data’, Hadoop and cloud computing in genomics journal October 2013
High Performance RDMA-Based MPI Implementation over InfiniBand journal June 2004
An Overview of the Atmospheric Component of the Energy Exascale Earth System Model journal August 2019
Search and clustering orders of magnitude faster than BLAST journal August 2010
Snakemake—a scalable bioinformatics workflow engine journal May 2018
An introduction to disk drive modeling journal March 1994
NetCDF: an interface for scientific data access journal July 1990
FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems conference October 2014
Scientific computing meets big data technology: An astronomy use case conference October 2015
File System Scalability with Highly Decentralized Metadata on Independent Storage Devices conference May 2016
Mercury: Enabling remote procedure call for high-performance computing conference September 2013
TRIO: Burst Buffer Based I/O Orchestration conference September 2015
Deduplication Potential of HPC Applications’ Checkpoints conference September 2016
Improving Collective I/O Performance Using Non-volatile Memory Devices conference September 2016
Harmonia: An Interference-Aware Dynamic I/O Scheduler for Shared Non-volatile Burst Buffers conference September 2018
GekkoFS - A Temporary Distributed File System for HPC Applications conference September 2018
NORNS: Extending Slurm to Support Data-Driven Workflows through Asynchronous Data Staging conference September 2019
Efficient Data-Movement for Lightweight I/O conference September 2006
UCX: An Open Source Framework for HPC Network APIs and Beyond conference August 2015
A Brief Introduction to the OpenFabrics Interfaces - A New Network API for Maximizing High Performance Application Efficiency conference August 2015
Data Elevator: Low-Contention Data Movement in Hierarchical Storage System conference December 2016
Managing I/O Interference in a Shared Burst Buffer System conference August 2016
Qthreads: An API for programming with millions of lightweight threads
  • Wheeler, Kyle B.; Murphy, Richard C.; Thain, Douglas
  • Distributed Processing Symposium (IPDPS), 2008 IEEE International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2008.4536359
conference April 2008
High-Performance Design of YARN MapReduce on Modern HPC Clusters with Lustre and RDMA conference May 2015
On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems conference May 2016
Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems
  • Zhu, Yue; Chowdhury, Fahim; Fu, Huansong
  • 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) https://doi.org/10.1109/MASCOTS.2018.00023
conference September 2018
The Hadoop Distributed File System conference May 2010
On the role of burst buffers in leadership-class storage systems conference April 2012
Optimizing a hybrid SSD/HDD HPC storage system based on file size distributions conference May 2013
Methodology for the Rapid Development of Scalable HPC Data Services conference November 2018
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
  • Moody, Adam; Bronevetsky, Greg; Mohror, Kathryn
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.18
conference November 2010
Managing Variability in the IO Performance of Petascale Storage Systems
  • Lofstead, Jay; Zheng, Fang; Liu, Qing
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.32
conference November 2010
Characterizing output bottlenecks in a supercomputer
  • Xie, Bing; Chase, Jeffrey; Dillow, David
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.28
conference November 2012
Cray Cascade: A scalable HPC system based on a Dragonfly network
  • Faanes, Greg; Bataineh, Abdulla; Roweth, Duncan
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.39
conference November 2012
MCREngine: A scalable checkpointing system using data-aware aggregation and compression
  • Islam, Tanzima Zerin; Mohror, Kathryn; Bagchi, Saurabh
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.77
conference November 2012
An Ephemeral Burst-Buffer File System for Scientific Applications
  • Wang, Teng; Mohror, Kathryn; Moody, Adam
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.68
conference November 2016
Scaling Embedded In-Situ Indexing with DeltaFS conference November 2018
Exascale Deep Learning for Climate Analytics conference November 2018
Stacker: An Autonomic Data Movement Engine for Extreme-Scale Data Staging-Based In-Situ Workflows conference November 2018
Poster: Portals 4 Network Programming Interface
  • Barrett, Brian; Brightwell, Ron; Underwood, Keith
  • 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis https://doi.org/10.1109/SC.Companion.2012.264
conference November 2012
Argobots: A Lightweight Low-Level Threading and Tasking Framework journal March 2018
Parallel netCDF: A High-Performance Scientific I/O Interface conference January 2003
Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS)
  • Lofstead, Jay F.; Klasky, Scott; Schwan, Karsten
  • Proceedings of the 6th international workshop on Challenges of large applications in distributed environments - CLADE '08 https://doi.org/10.1145/1383529.1383533
conference January 2008
PLFS: a checkpoint filesystem for parallel applications conference January 2009
Performance and extension of user space file systems conference January 2010
An overview of the HDF5 technology suite and its applications conference January 2011
Understanding and Improving Computational Science Storage Access through Continuous Characterization journal October 2011
The IBM Blue Gene/Q interconnection network and message unit
  • Chen, Dong; Parker, Jeffrey J.; Eisley, Noel A.
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063419
conference January 2011
A 1 PB/s file system to checkpoint three million MPI tasks
  • Rajachandrasekar, Raghunath; Moody, Adam; Mohror, Kathryn
  • HPDC'13: The 22nd International Symposium on High-Performance Parallel and Distributed Computing, Proceedings of the 22nd international symposium on High-performance parallel and distributed computing https://doi.org/10.1145/2462902.2462908
conference October 2018
Direct lookup and hash-based metadata placement for local file systems conference January 2013
A 1 PB/s file system to checkpoint three million MPI tasks
  • Rajachandrasekar, Raghunath; Moody, Adam; Mohror, Kathryn
  • Proceedings of the 22nd international symposium on High-performance parallel and distributed computing - HPDC '13 https://doi.org/10.1145/2493123.2462908
conference January 2013
Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web conference January 1997
A Large-Scale Study of Flash Memory Failures in the Field journal June 2015
SSD Failures in Datacenters: What? When? and Why? conference January 2016
Apache Spark: a unified engine for big data processing journal October 2016
A configurable rule based classful token bucket filter network request scheduler for the lustre file system
  • Qian, Yingjin; Li, Xi; Ihara, Shuichi
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17 https://doi.org/10.1145/3126908.3126932
conference January 2017
Challenges and Solutions for Tracing Storage Systems: A Case Study with Spectrum Scale journal April 2018
LPCC: hierarchical persistent client caching for lustre
  • Qian, Yingjin; Li, Xi; Ihara, Shuichi
  • SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356139
conference November 2019
On the Quality of Wall Time Estimates for Resource Allocation Prediction conference January 2019
Task-based programming in COMPSs to converge from HPC to big data journal April 2017
Brochure on Storage Systems and I/O: Organizing, Storing, and Accessing Data for Scientific Discovery report July 2019