skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

This content will become publicly available on January 17, 2021

Title: Ad Hoc File Systems for High-Performance Computing

Abstract

Storage backends of parallel compute clusters are still based mostly on magnetic disks, while newer and faster storage technologies such as flash-based SSDs or non-volatile random access memory (NVRAM) are deployed within compute nodes. Including these new storage technologies into scientific workflows is unfortunately today a mostly manual task, and most scientists therefore do not take advantage of the faster storage media. One approach to systematically include nodelocal SSDs or NVRAMs into scientific workflows is to deploy ad hoc file systems over a set of compute nodes, which serve as temporary storage systems for single applications or longer-running campaigns. This paper presents results from the Dagstuhl Seminar 17202 “Challenges and Opportunities of User-Level File Systems for HPC” and discusses application scenarios as well as design strategies for ad hoc file systems using node-local storage media. The discussion includes open research questions, such as how to couple ad hoc file systems with the batch scheduling environment and how to schedule stage-in and stage-out processes of data between the storage backend and the ad hoc file systems. Also presented are strategies to build ad hoc file systems by using reusable components for networking and how to improve storage device compatibility. Various interfacesmore » and semantics are presented, for example those used by the three ad hoc file systems BeeOND, GekkoFS, and BurstFS. Their presentation covers a range from file systems running in production to cutting-edge research focusing on reaching the performance limits of the underlying devices.« less

Authors:
 [1];  [2];  [3];  [4];  [5];  [6];  [7];  [8];  [4];  [1]
  1. Johannes Gutenberg Univ., Mainz (Germany)
  2. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
  3. Florida State Univ., Tallahassee, FL (United States)
  4. Argonne National Lab. (ANL), Argonne, IL (United States)
  5. Univ. Politecnica de Catalunya, Barcelona (Spain)
  6. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  7. Barcelona Supercomputing Center, Barcelona (Spain)
  8. Fraunhofer Inst. for Industrial Mathematics ITWM, Kaiserslautern (Germany)
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21); German Research Foundation (DFG); European Union (EU); Spanish Ministry of Science and Innovation (MICINN); National Science Foundation (NSF)
OSTI Identifier:
1596689
Grant/Contract Number:  
AC02-06CH11357; 1561041; 1564647; 1744336; 1763547; 1822737; 2014-SGR-1051; TIN2015-65316; 671591
Resource Type:
Accepted Manuscript
Journal Name:
Journal of Computer Science and Technology
Additional Journal Information:
Journal Volume: 35; Journal Issue: 1; Journal ID: ISSN 1000-9000
Country of Publication:
United States
Language:
English
Subject:
Burst Buffers; Distributed File Systems; High-Performance Computing; POSIX; Parallel Architectures

Citation Formats

Brinkmann, André, Mohror, Kathryn, Yu, Weikuan, Carns, Philip, Cortes, Toni, Klasky, Scott A., Miranda, Alberto, Pfreundt, Franz-Josef, Ross, Robert B., and Vef, Marc-André. Ad Hoc File Systems for High-Performance Computing. United States: N. p., 2020. Web. doi:10.1007/s11390-020-9801-1.
Brinkmann, André, Mohror, Kathryn, Yu, Weikuan, Carns, Philip, Cortes, Toni, Klasky, Scott A., Miranda, Alberto, Pfreundt, Franz-Josef, Ross, Robert B., & Vef, Marc-André. Ad Hoc File Systems for High-Performance Computing. United States. doi:10.1007/s11390-020-9801-1.
Brinkmann, André, Mohror, Kathryn, Yu, Weikuan, Carns, Philip, Cortes, Toni, Klasky, Scott A., Miranda, Alberto, Pfreundt, Franz-Josef, Ross, Robert B., and Vef, Marc-André. Fri . "Ad Hoc File Systems for High-Performance Computing". United States. doi:10.1007/s11390-020-9801-1.
@article{osti_1596689,
title = {Ad Hoc File Systems for High-Performance Computing},
author = {Brinkmann, André and Mohror, Kathryn and Yu, Weikuan and Carns, Philip and Cortes, Toni and Klasky, Scott A. and Miranda, Alberto and Pfreundt, Franz-Josef and Ross, Robert B. and Vef, Marc-André},
abstractNote = {Storage backends of parallel compute clusters are still based mostly on magnetic disks, while newer and faster storage technologies such as flash-based SSDs or non-volatile random access memory (NVRAM) are deployed within compute nodes. Including these new storage technologies into scientific workflows is unfortunately today a mostly manual task, and most scientists therefore do not take advantage of the faster storage media. One approach to systematically include nodelocal SSDs or NVRAMs into scientific workflows is to deploy ad hoc file systems over a set of compute nodes, which serve as temporary storage systems for single applications or longer-running campaigns. This paper presents results from the Dagstuhl Seminar 17202 “Challenges and Opportunities of User-Level File Systems for HPC” and discusses application scenarios as well as design strategies for ad hoc file systems using node-local storage media. The discussion includes open research questions, such as how to couple ad hoc file systems with the batch scheduling environment and how to schedule stage-in and stage-out processes of data between the storage backend and the ad hoc file systems. Also presented are strategies to build ad hoc file systems by using reusable components for networking and how to improve storage device compatibility. Various interfaces and semantics are presented, for example those used by the three ad hoc file systems BeeOND, GekkoFS, and BurstFS. Their presentation covers a range from file systems running in production to cutting-edge research focusing on reaching the performance limits of the underlying devices.},
doi = {10.1007/s11390-020-9801-1},
journal = {Journal of Computer Science and Technology},
number = 1,
volume = 35,
place = {United States},
year = {2020},
month = {1}
}

Journal Article:
Free Publicly Available Full Text
This content will become publicly available on January 17, 2021
Publisher's Version of Record

Save / Share:

Works referenced in this record:

Characterizing output bottlenecks in a supercomputer
conference, November 2012

  • Xie, Bing; Chase, Jeffrey; Dillow, David
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2012.28

TRIO: Burst Buffer Based I/O Orchestration
conference, September 2015

  • Wang, Teng; Oral, Sarp; Pritchard, Michael
  • 2015 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2015.38

Scaling Embedded In-Situ Indexing with DeltaFS
conference, November 2018

  • Zheng, Qing; Cranor, Charles D.; Guo, Danhao
  • SC18: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2018.00006

An introduction to disk drive modeling
journal, March 1994


‘Big data’, Hadoop and cloud computing in genomics
journal, October 2013

  • O’Driscoll, Aisling; Daugelaite, Jurate; Sleator, Roy D.
  • Journal of Biomedical Informatics, Vol. 46, Issue 5
  • DOI: 10.1016/j.jbi.2013.07.001

Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS)
conference, January 2008

  • Lofstead, Jay F.; Klasky, Scott; Schwan, Karsten
  • Proceedings of the 6th international workshop on Challenges of large applications in distributed environments - CLADE '08
  • DOI: 10.1145/1383529.1383533

The IBM Blue Gene/Q interconnection network and message unit
conference, January 2011

  • Chen, Dong; Parker, Jeffrey J.; Eisley, Noel A.
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
  • DOI: 10.1145/2063384.2063419

Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems
conference, September 2018

  • Zhu, Yue; Chowdhury, Fahim; Fu, Huansong
  • 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)
  • DOI: 10.1109/MASCOTS.2018.00023

LPCC: hierarchical persistent client caching for lustre
conference, November 2019

  • Qian, Yingjin; Li, Xi; Ihara, Shuichi
  • SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1145/3295500.3356139

Task-based programming in COMPSs to converge from HPC to big data
journal, April 2017

  • Conejero, Javier; Corella, Sandra; Badia, Rosa M.
  • The International Journal of High Performance Computing Applications, Vol. 32, Issue 1
  • DOI: 10.1177/1094342017701278

On the role of burst buffers in leadership-class storage systems
conference, April 2012

  • Liu, Ning; Cope, Jason; Carns, Philip
  • 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)
  • DOI: 10.1109/MSST.2012.6232369

FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems
conference, October 2014


MCREngine: A scalable checkpointing system using data-aware aggregation and compression
conference, November 2012

  • Islam, Tanzima Zerin; Mohror, Kathryn; Bagchi, Saurabh
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2012.77

Direct lookup and hash-based metadata placement for local file systems
conference, January 2013

  • Lensing, Paul Hermann; Cortes, Toni; Brinkmann, André
  • Proceedings of the 6th International Systems and Storage Conference on - SYSTOR '13
  • DOI: 10.1145/2485732.2485741

An overview of the HDF5 technology suite and its applications
conference, January 2011

  • Folk, Mike; Heber, Gerd; Koziol, Quincey
  • Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases - AD '11
  • DOI: 10.1145/1966895.1966900

Cray Cascade: A scalable HPC system based on a Dragonfly network
conference, November 2012

  • Faanes, Greg; Bataineh, Abdulla; Roweth, Duncan
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2012.39

Efficient Data-Movement for Lightweight I/O
conference, September 2006

  • Oldfield, Ron; Widener, Patrick; Maccabe, Arthur
  • 2006 IEEE International Conference on Cluster Computing
  • DOI: 10.1109/CLUSTR.2006.311897

Harmonia: An Interference-Aware Dynamic I/O Scheduler for Shared Non-volatile Burst Buffers
conference, September 2018

  • Kougkas, Anthony; Devarajan, Hariharan; Sun, Xian-He
  • 2018 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2018.00046

Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
conference, November 2010

  • Moody, Adam; Bronevetsky, Greg; Mohror, Kathryn
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2010.18

Methodology for the Rapid Development of Scalable HPC Data Services
conference, November 2018

  • Dorier, Matthieu; Settlemyer, Brad; Shipman, Galen
  • 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS)
  • DOI: 10.1109/PDSW-DISCS.2018.00013

Data Elevator: Low-Contention Data Movement in Hierarchical Storage System
conference, December 2016

  • Dong, Bin; Byna, Suren; Wu, Kesheng
  • 2016 IEEE 23rd International Conference on High Performance Computing (HiPC)
  • DOI: 10.1109/HiPC.2016.026

Stacker: An Autonomic Data Movement Engine for Extreme-Scale Data Staging-Based In-Situ Workflows
conference, November 2018

  • Subedi, Pradeep; Davis, Philip; Duan, Shaohua
  • SC18: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2018.00076

ROOT — A C++ framework for petabyte data storage, statistical analysis and visualization
journal, June 2011

  • Antcheva, I.; Ballintijn, M.; Bellenot, B.
  • Computer Physics Communications, Vol. 182, Issue 6
  • DOI: 10.1016/j.cpc.2011.02.008

A Brief Introduction to the OpenFabrics Interfaces - A New Network API for Maximizing High Performance Application Efficiency
conference, August 2015

  • Grun, Paul; Hefty, Sean; Sur, Sayantan
  • 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects (HOTI)
  • DOI: 10.1109/HOTI.2015.19

Deduplication Potential of HPC Applications’ Checkpoints
conference, September 2016

  • Kaiser, Jurgen; Gad, Ramy; SuB, Tim
  • 2016 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2016.32

UCX: An Open Source Framework for HPC Network APIs and Beyond
conference, August 2015

  • Shamis, Pavel; Venkata, Manjunath Gorentla; Lopez, M. Graham
  • 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects (HOTI)
  • DOI: 10.1109/HOTI.2015.13

GekkoFS - A Temporary Distributed File System for HPC Applications
conference, September 2018

  • Vef, Marc-Andre; Moti, Nafiseh; SuB, Tim
  • 2018 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2018.00049

High-Performance Design of YARN MapReduce on Modern HPC Clusters with Lustre and RDMA
conference, May 2015

  • Wasi-ur-Rahman, Md.; Lu, Xiaoyi; Islam, Nusrat Sharmin
  • 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2015.83

Improving Collective I/O Performance Using Non-volatile Memory Devices
conference, September 2016

  • Congiu, Giuseppe; Narasimhamurthy, Sai; Suss, Tim
  • 2016 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2016.37

On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems
conference, May 2016

  • Yildiz, Orcun; Dorier, Matthieu; Ibrahim, Shadi
  • 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2016.50

NetCDF: an interface for scientific data access
journal, July 1990

  • Rew, R.; Davis, G.
  • IEEE Computer Graphics and Applications, Vol. 10, Issue 4
  • DOI: 10.1109/38.56302

Optimizing a hybrid SSD/HDD HPC storage system based on file size distributions
conference, May 2013

  • Welch, Brent; Noer, Geoffrey
  • 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST)
  • DOI: 10.1109/MSST.2013.6558449

Apache Spark: a unified engine for big data processing
journal, October 2016

  • Zaharia, Matei; Franklin, Michael J.; Ghodsi, Ali
  • Communications of the ACM, Vol. 59, Issue 11
  • DOI: 10.1145/2934664

Search and clustering orders of magnitude faster than BLAST
journal, August 2010


A configurable rule based classful token bucket filter network request scheduler for the lustre file system
conference, January 2017

  • Qian, Yingjin; Li, Xi; Ihara, Shuichi
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17
  • DOI: 10.1145/3126908.3126932

The Hadoop Distributed File System
conference, May 2010

  • Shvachko, Konstantin; Kuang, Hairong; Radia, Sanjay
  • 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
  • DOI: 10.1109/MSST.2010.5496972

Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web
conference, January 1997

  • Karger, David; Lehman, Eric; Leighton, Tom
  • Proceedings of the twenty-ninth annual ACM symposium on Theory of computing - STOC '97
  • DOI: 10.1145/258533.258660

Scientific computing meets big data technology: An astronomy use case
conference, October 2015

  • Zhang, Zhao; Barbary, Kyle; Nothaft, Frank Austin
  • 2015 IEEE International Conference on Big Data (Big Data)
  • DOI: 10.1109/BigData.2015.7363840

Challenges and Solutions for Tracing Storage Systems: A Case Study with Spectrum Scale
journal, April 2018

  • Vef, Marc-André; Tarasov, Vasily; Hildebrand, Dean
  • ACM Transactions on Storage, Vol. 14, Issue 2
  • DOI: 10.1145/3149376

Understanding and Improving Computational Science Storage Access through Continuous Characterization
journal, October 2011

  • Carns, Philip; Harms, Kevin; Allcock, William
  • ACM Transactions on Storage, Vol. 7, Issue 3, p. 1-26
  • DOI: 10.1145/2027066.2027068

PLFS: a checkpoint filesystem for parallel applications
conference, January 2009


An Overview of the Atmospheric Component of the Energy Exascale Earth System Model
journal, August 2019

  • Rasch, P. J.; Xie, S.; Ma, P. ‐L.
  • Journal of Advances in Modeling Earth Systems, Vol. 11, Issue 8
  • DOI: 10.1029/2019MS001629

On the Quality of Wall Time Estimates for Resource Allocation Prediction
conference, January 2019

  • Soysal, Mehmet; Berghoff, Marco; Klusáček, Dalibor
  • Proceedings of the 48th International Conference on Parallel Processing: Workshops - ICPP 2019
  • DOI: 10.1145/3339186.3339204

Mercury: Enabling remote procedure call for high-performance computing
conference, September 2013

  • Soumagne, Jerome; Kimpe, Dries; Zounmevo, Judicael
  • 2013 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2013.6702617

Qthreads: An API for programming with millions of lightweight threads
conference, April 2008

  • Wheeler, Kyle B.; Murphy, Richard C.; Thain, Douglas
  • Distributed Processing Symposium (IPDPS), 2008 IEEE International Symposium on Parallel and Distributed Processing
  • DOI: 10.1109/IPDPS.2008.4536359

Managing Variability in the IO Performance of Petascale Storage Systems
conference, November 2010

  • Lofstead, Jay; Zheng, Fang; Liu, Qing
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2010.32

Snakemake—a scalable bioinformatics workflow engine
journal, May 2018


Managing I/O Interference in a Shared Burst Buffer System
conference, August 2016

  • Thapaliya, Sagar; Bangalore, Purushotham; Lofstead, Jat
  • 2016 45th International Conference on Parallel Processing (ICPP)
  • DOI: 10.1109/ICPP.2016.54

SSD Failures in Datacenters: What? When? and Why?
conference, January 2016

  • Narayanan, Iyswarya; Vaid, Kushagra; Wang, Di
  • Proceedings of the 9th ACM International on Systems and Storage Conference - SYSTOR '16
  • DOI: 10.1145/2928275.2928278

Argobots: A Lightweight Low-Level Threading and Tasking Framework
journal, March 2018

  • Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 29, Issue 3
  • DOI: 10.1109/TPDS.2017.2766062

Exascale Deep Learning for Climate Analytics
conference, November 2018

  • Kurth, Thorsten; Treichler, Sean; Romero, Joshua
  • SC18: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2018.00054

Poster: Portals 4 Network Programming Interface
conference, November 2012

  • Barrett, Brian; Brightwell, Ron; Underwood, Keith
  • 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
  • DOI: 10.1109/SC.Companion.2012.264

NORNS: Extending Slurm to Support Data-Driven Workflows through Asynchronous Data Staging
conference, September 2019

  • Miranda, Alberto; Jackson, Adrian; Tocci, Tommaso
  • 2019 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2019.8891014

A Large-Scale Study of Flash Memory Failures in the Field
journal, June 2015

  • Meza, Justin; Wu, Qiang; Kumar, Sanjev
  • ACM SIGMETRICS Performance Evaluation Review, Vol. 43, Issue 1
  • DOI: 10.1145/2796314.2745848

High Performance RDMA-Based MPI Implementation over InfiniBand
journal, June 2004


A 1 PB/s file system to checkpoint three million MPI tasks
conference, January 2013

  • Rajachandrasekar, Raghunath; Moody, Adam; Mohror, Kathryn
  • Proceedings of the 22nd international symposium on High-performance parallel and distributed computing - HPDC '13
  • DOI: 10.1145/2493123.2462908

Parallel netCDF: A High-Performance Scientific I/O Interface
conference, January 2003

  • Li, Jianwei; Zingale, Michael; Liao, Wei-keng
  • Proceedings of the 2003 ACM/IEEE conference on Supercomputing - SC '03
  • DOI: 10.1145/1048935.1050189

Performance and extension of user space file systems
conference, January 2010

  • Rajgarhia, Aditya; Gehani, Ashish
  • Proceedings of the 2010 ACM Symposium on Applied Computing - SAC '10
  • DOI: 10.1145/1774088.1774130

An Ephemeral Burst-Buffer File System for Scientific Applications
conference, November 2016

  • Wang, Teng; Mohror, Kathryn; Moody, Adam
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2016.68

File System Scalability with Highly Decentralized Metadata on Independent Storage Devices
conference, May 2016

  • Lensing, Paul Hermann; Cortes, Toni; Hughes, Jim
  • 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)
  • DOI: 10.1109/CCGrid.2016.28