Formal Definitions and Performance Comparison of Consistency Models for Parallel File Systems

Wang, Chen; Mohror, Kathryn; Snir, Marc

doi:10.1109/tpds.2024.3391058

Formal Definitions and Performance Comparison of Consistency Models for Parallel File Systems

Journal Article · Thu Apr 18 00:00:00 EDT 2024 · IEEE Transactions on Parallel and Distributed Systems

DOI:https://doi.org/10.1109/tpds.2024.3391058· OSTI ID:2370617

^[1]; ^[1]; ^[2]

Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
University of Illinois at Urbana-Champaign, IL (United States)

The semantics of HPC storage systems are defined by the consistency models to which they abide. Storage consistency models have been less studied than their counterparts in memory systems, with the exception of the POSIX standard and its strict consistency model. The use of POSIX consistency imposes a performance penalty that becomes more significant as the scale of parallel file systems increases and the access time to storage devices, such as node-local solid storage devices, decreases. While some efforts have been made to adopt relaxed storage consistency models, these models are often defined informally and ambiguously as by-products of a particular implementation. Here in this work, we establish a connection between memory consistency models and storage consistency models and revisit the key design choices of storage consistency models from a high-level perspective. Further, we propose a formal and unified framework for defining storage consistency models and a layered implementation that can be used to easily evaluate their relative performance for different I/O workloads. Finally, we conduct a comprehensive performance comparison of two relaxed consistency models on a range of commonly seen parallel I/O workloads, such as checkpoint/restart of scientific applications and random reads of deep learning applications. We demonstrate that for certain I/O scenarios, a weaker consistency model can significantly improve the I/O performance. For instance, in small random reads that are typically found in deep learning applications, session consistency achieved a 5x improvement in I/O bandwidth compared to commit consistency, even at small scales.

View Accepted Manuscript (DOE)

Research Organization:: Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)

Sponsoring Organization:: National Science Foundation (NSF); USDOE Laboratory Directed Research and Development (LDRD) Program; USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)

Grant/Contract Number:: AC52-07NA27344

OSTI ID:: 2370617

Report Number(s):: LLNL--JRNL-849174; 1074740

Journal Information:: IEEE Transactions on Parallel and Distributed Systems, Journal Name: IEEE Transactions on Parallel and Distributed Systems Journal Issue: 6 Vol. 35; ISSN 1045-9219

Publisher:: IEEECopyright Statement

Country of Publication:: United States

Language:: English

References (23)

Overview of the MPI-IO Parallel I/O Interface Corbett, Peter; Feitelson, Dror; Fineberg, Sam The Kluwer International Series in Engineering and Computer Science https://doi.org/10.1007/978-1-4613-1401-1_5	book	January 1996
Gfarm/BB — Gfarm File System for Node-Local Burst Buffer Tatebe, Osamu; Moriwake, Shukuko; Oyama, Yoshihiro Journal of Computer Science and Technology, Vol. 35, Issue 1 https://doi.org/10.1007/s11390-020-9803-z	journal	January 2020
ECHOFS: A Scheduler-Guided Temporary Filesystem to Leverage Node-Local NVMS Miranda, Alberto; Nou, Ramon; Cortes, Toni 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) https://doi.org/10.1109/CAHPC.2018.8645894	conference	September 2018
Parallelizing Training of Deep Generative Models on Massive Scientific Datasets Jacobs, Sam Ade; Gaffney, Jim; Benson, Tom 2019 IEEE International Conference on Cluster Computing (CLUSTER) https://doi.org/10.1109/CLUSTER.2019.8891012	conference	September 2019
ImageNet: A large-scale hierarchical image database Deng, Jia; Dong, Wei; Socher, Richard 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), 2009 IEEE Conference on Computer Vision and Pattern Recognition https://doi.org/10.1109/CVPR.2009.5206848	conference	June 2009
Understanding HPC Application I/O Behavior Using System Level Statistics Paul, Arnab K.; Faaland, Olaf; Moody, Adam 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC) https://doi.org/10.1109/HiPC50609.2020.00034	conference	December 2020
UnifyFS: A User-level Shared File System for Unified Access to Distributed Local Storage Brim, Michael J.; Moody, Adam T.; Lim, Seung-Hwan 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS54959.2023.00037	conference	May 2023
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System Moody, Adam; Bronevetsky, Greg; Mohror, Kathryn 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.18	conference	November 2010
An Ephemeral Burst-Buffer File System for Scientific Applications Wang, Teng; Mohror, Kathryn; Moody, Adam SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.68	conference	November 2016
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs No authors listed IEEE Transactions on Computers, Vol. C-28, Issue 9 https://doi.org/10.1109/TC.1979.1675439	journal	September 1979
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism Oyama, Yosuke; Maruyama, Naoya; Dryden, Nikoli IEEE Transactions on Parallel and Distributed Systems https://doi.org/10.1109/TPDS.2020.3047974	journal	January 2021
The Java memory model Manson, Jeremy; Pugh, William; Adve, Sarita V. ACM SIGPLAN Notices, Vol. 40, Issue 1 https://doi.org/10.1145/1047659.1040336	journal	January 2005
Memory access buffering in multiprocessors Dubois, M.; Scheurich, C.; Briggs, F. ACM SIGARCH Computer Architecture News, Vol. 14, Issue 2 https://doi.org/10.1145/17356.17406	journal	May 1986
x86-TSO Sewell, Peter; Sarkar, Susmit; Owens, Scott Communications of the ACM, Vol. 53, Issue 7 https://doi.org/10.1145/1785414.1785443	journal	July 2010
LBANN: livermore big artificial neural network HPC toolkit Van Essen, Brian; Kim, Hyojin; Pearce, Roger Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments - MLHPC '15 https://doi.org/10.1145/2834892.2834897	conference	January 2015
Towards Scalable Parallel Training of Deep Neural Networks Jacobs, Sam Adé; Dryden, Nikoli; Pearce, Roger SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the Machine Learning on HPC Environments https://doi.org/10.1145/3146347.3146353	conference	November 2017
Weak ordering—a new definition Adve, Sarita V.; Hill, Mark D. ACM SIGARCH Computer Architecture News, Vol. 18, Issue 2SI https://doi.org/10.1145/325096.325100	journal	June 1990
Memory consistency and event ordering in scalable shared-memory multiprocessors Gharachorloo, Kourosh; Lenoski, Daniel; Laudon, James ACM SIGARCH Computer Architecture News, Vol. 18, Issue 2SI https://doi.org/10.1145/325096.325102	journal	May 1990
End-to-end I/O portfolio for the summit supercomputing ecosystem Oral, Sarp; Vazhkudai, Sudharshan S.; Wang, Feiyi Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356157	conference	November 2019
Revisiting I/O behavior in large-scale storage systems Patel, Tirthak; Byna, Suren; Lockwood, Glenn K. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356183	conference	November 2019
A massively parallel infrastructure for adaptive multiscale simulations: modeling RAS initiation pathway for cancer Di Natale, Francesco; Bhatia, Harsh; Carpenter, Timothy S. SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356197	conference	November 2019
File System Semantics Requirements of HPC Applications Wang, Chen; Mohror, Kathryn; Snir, Marc Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing https://doi.org/10.1145/3431379.3460637	conference	June 2021
Clairvoyant prefetching for distributed machine learning I/O Dryden, Nikoli; Böhringer, Roman; Ben-Nun, Tal Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3458817.3476181	conference	November 2021