DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Ad Hoc File Systems for High-Performance Computing

Journal Article · · Journal of Computer Science and Technology
 [1];  [2];  [3];  [4];  [5];  [6];  [7];  [8];  [4];  [1]
  1. Johannes Gutenberg Univ., Mainz (Germany)
  2. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
  3. Florida State Univ., Tallahassee, FL (United States)
  4. Argonne National Lab. (ANL), Argonne, IL (United States)
  5. Univ. Politecnica de Catalunya, Barcelona (Spain)
  6. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  7. Barcelona Supercomputing Center, Barcelona (Spain)
  8. Fraunhofer Inst. for Industrial Mathematics ITWM, Kaiserslautern (Germany)

Storage backends of parallel compute clusters are still based mostly on magnetic disks, while newer and faster storage technologies such as flash-based SSDs or non-volatile random access memory (NVRAM) are deployed within compute nodes. Including these new storage technologies into scientific workflows is unfortunately today a mostly manual task, and most scientists therefore do not take advantage of the faster storage media. One approach to systematically include nodelocal SSDs or NVRAMs into scientific workflows is to deploy ad hoc file systems over a set of compute nodes, which serve as temporary storage systems for single applications or longer-running campaigns. This paper presents results from the Dagstuhl Seminar 17202 “Challenges and Opportunities of User-Level File Systems for HPC” and discusses application scenarios as well as design strategies for ad hoc file systems using node-local storage media. The discussion includes open research questions, such as how to couple ad hoc file systems with the batch scheduling environment and how to schedule stage-in and stage-out processes of data between the storage backend and the ad hoc file systems. Also presented are strategies to build ad hoc file systems by using reusable components for networking and how to improve storage device compatibility. Various interfaces and semantics are presented, for example those used by the three ad hoc file systems BeeOND, GekkoFS, and BurstFS. Their presentation covers a range from file systems running in production to cutting-edge research focusing on reaching the performance limits of the underlying devices.

Research Organization:
Argonne National Laboratory (ANL), Argonne, IL (United States); Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); German Research Foundation (DFG); European Union (EU); Spanish Ministry of Science and Innovation (MICINN); National Science Foundation (NSF); USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC02-06CH11357; 1561041; 1564647; 1744336; 1763547; 1822737; 2014-SGR-1051; TIN2015-65316; 671591; AC52-07NA27344
OSTI ID:
1596689
Alternate ID(s):
OSTI ID: 1606092
Report Number(s):
LLNL-JRNL-779789; 155300
Journal Information:
Journal of Computer Science and Technology, Vol. 35, Issue 1; ISSN 1000-9000
Publisher:
Springer NatureCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 13 works
Citation information provided by
Web of Science

References (62)

Characterizing output bottlenecks in a supercomputer
  • Xie, Bing; Chase, Jeffrey; Dillow, David
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.28
conference November 2012
TRIO: Burst Buffer Based I/O Orchestration conference September 2015
Scaling Embedded In-Situ Indexing with DeltaFS conference November 2018
An introduction to disk drive modeling journal March 1994
‘Big data’, Hadoop and cloud computing in genomics journal October 2013
Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS)
  • Lofstead, Jay F.; Klasky, Scott; Schwan, Karsten
  • Proceedings of the 6th international workshop on Challenges of large applications in distributed environments - CLADE '08 https://doi.org/10.1145/1383529.1383533
conference January 2008
The IBM Blue Gene/Q interconnection network and message unit
  • Chen, Dong; Parker, Jeffrey J.; Eisley, Noel A.
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063419
conference January 2011
Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems
  • Zhu, Yue; Chowdhury, Fahim; Fu, Huansong
  • 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) https://doi.org/10.1109/MASCOTS.2018.00023
conference September 2018
Brochure on Storage Systems and I/O: Organizing, Storing, and Accessing Data for Scientific Discovery report July 2019
LPCC: hierarchical persistent client caching for lustre
  • Qian, Yingjin; Li, Xi; Ihara, Shuichi
  • SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356139
conference November 2019
Task-based programming in COMPSs to converge from HPC to big data journal April 2017
On the role of burst buffers in leadership-class storage systems conference April 2012
FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems conference October 2014
MCREngine: A scalable checkpointing system using data-aware aggregation and compression
  • Islam, Tanzima Zerin; Mohror, Kathryn; Bagchi, Saurabh
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.77
conference November 2012
Direct lookup and hash-based metadata placement for local file systems conference January 2013
An overview of the HDF5 technology suite and its applications conference January 2011
Cray Cascade: A scalable HPC system based on a Dragonfly network
  • Faanes, Greg; Bataineh, Abdulla; Roweth, Duncan
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.39
conference November 2012
Efficient Data-Movement for Lightweight I/O conference September 2006
Harmonia: An Interference-Aware Dynamic I/O Scheduler for Shared Non-volatile Burst Buffers conference September 2018
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
  • Moody, Adam; Bronevetsky, Greg; Mohror, Kathryn
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.18
conference November 2010
Methodology for the Rapid Development of Scalable HPC Data Services conference November 2018
Data Elevator: Low-Contention Data Movement in Hierarchical Storage System conference December 2016
Stacker: An Autonomic Data Movement Engine for Extreme-Scale Data Staging-Based In-Situ Workflows conference November 2018
ROOT — A C++ framework for petabyte data storage, statistical analysis and visualization journal June 2011
A Brief Introduction to the OpenFabrics Interfaces - A New Network API for Maximizing High Performance Application Efficiency conference August 2015
Deduplication Potential of HPC Applications’ Checkpoints conference September 2016
UCX: An Open Source Framework for HPC Network APIs and Beyond conference August 2015
GekkoFS - A Temporary Distributed File System for HPC Applications conference September 2018
High-Performance Design of YARN MapReduce on Modern HPC Clusters with Lustre and RDMA conference May 2015
Improving Collective I/O Performance Using Non-volatile Memory Devices conference September 2016
On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems conference May 2016
NetCDF: an interface for scientific data access journal July 1990
Optimizing a hybrid SSD/HDD HPC storage system based on file size distributions conference May 2013
Apache Spark: a unified engine for big data processing journal October 2016
Search and clustering orders of magnitude faster than BLAST journal August 2010
A configurable rule based classful token bucket filter network request scheduler for the lustre file system
  • Qian, Yingjin; Li, Xi; Ihara, Shuichi
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17 https://doi.org/10.1145/3126908.3126932
conference January 2017
The Hadoop Distributed File System conference May 2010
Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web conference January 1997
Scientific computing meets big data technology: An astronomy use case conference October 2015
Challenges and Solutions for Tracing Storage Systems: A Case Study with Spectrum Scale journal April 2018
Understanding and Improving Computational Science Storage Access through Continuous Characterization journal October 2011
PLFS: a checkpoint filesystem for parallel applications conference January 2009
An Overview of the Atmospheric Component of the Energy Exascale Earth System Model journal August 2019
On the Quality of Wall Time Estimates for Resource Allocation Prediction conference January 2019
Mercury: Enabling remote procedure call for high-performance computing conference September 2013
Qthreads: An API for programming with millions of lightweight threads
  • Wheeler, Kyle B.; Murphy, Richard C.; Thain, Douglas
  • Distributed Processing Symposium (IPDPS), 2008 IEEE International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2008.4536359
conference April 2008
Managing Variability in the IO Performance of Petascale Storage Systems
  • Lofstead, Jay; Zheng, Fang; Liu, Qing
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.32
conference November 2010
Snakemake—a scalable bioinformatics workflow engine journal May 2018
Managing I/O Interference in a Shared Burst Buffer System conference August 2016
SSD Failures in Datacenters: What? When? and Why? conference January 2016
Argobots: A Lightweight Low-Level Threading and Tasking Framework journal March 2018
Exascale Deep Learning for Climate Analytics conference November 2018
Poster: Portals 4 Network Programming Interface
  • Barrett, Brian; Brightwell, Ron; Underwood, Keith
  • 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis https://doi.org/10.1109/SC.Companion.2012.264
conference November 2012
NORNS: Extending Slurm to Support Data-Driven Workflows through Asynchronous Data Staging conference September 2019
A Large-Scale Study of Flash Memory Failures in the Field journal June 2015
High Performance RDMA-Based MPI Implementation over InfiniBand journal June 2004
A 1 PB/s file system to checkpoint three million MPI tasks
  • Rajachandrasekar, Raghunath; Moody, Adam; Mohror, Kathryn
  • Proceedings of the 22nd international symposium on High-performance parallel and distributed computing - HPDC '13 https://doi.org/10.1145/2493123.2462908
conference January 2013
Parallel netCDF: A High-Performance Scientific I/O Interface conference January 2003
Performance and extension of user space file systems conference January 2010
An Ephemeral Burst-Buffer File System for Scientific Applications
  • Wang, Teng; Mohror, Kathryn; Moody, Adam
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.68
conference November 2016
File System Scalability with Highly Decentralized Metadata on Independent Storage Devices conference May 2016
A 1 PB/s file system to checkpoint three million MPI tasks
  • Rajachandrasekar, Raghunath; Moody, Adam; Mohror, Kathryn
  • HPDC'13: The 22nd International Symposium on High-Performance Parallel and Distributed Computing, Proceedings of the 22nd international symposium on High-performance parallel and distributed computing https://doi.org/10.1145/2462902.2462908
conference October 2018