End-to-end I/O portfolio for the summit supercomputing ecosystem

Oral, Sarp; Vazhkudai, Sudharshan; Wang, Feiyi; Zimmer, Christopher; Brumgard, Christopher; Hanley, Jesse; Markomanolis, George; Miller, Ross; Leverman, Dustin B.; Atchley, Scott {Leadership Computing}; Melesse Vergara, Veronica

doi:10.1145/3295500.3356157

End-to-end I/O portfolio for the summit supercomputing ecosystem

Conference · Fri Nov 01 04:00:00 EDT 2019

DOI:https://doi.org/10.1145/3295500.3356157· OSTI ID:1619016

^[1]; ^[1]; ^[1]; ^[1]; ^[1]; ^[1]; ^[1]; ^[1]; ^[1]; ^[1]; ^[1]

ORNL

The I/O subsystem for the Summit supercomputer, No. 1 on the Top500 list, and its ecosystem of analysis platforms is composed of two distinct layers, namely the in-system layer and the center-wide parallel file system layer (PFS), Spider 3. The in-system layer uses node-local SSDs and provides 26.7 TB/s for reads, 9.7 TB/s for writes, and 4.6 billion IOPS to Summit. The Spider 3 PFS layer uses IBM's Spectrum Scale™ and provides 2.5 TB/s and 2.6 million IOPS to Summit and other systems. While deploying them as two distinct layers was operationally efficient, it also presented usability challenges in terms of multiple mount points and lack of transparency in data movement. To address these challenges, we have developed novel end-to-end I/O solutions for the concerted use of the two storage layers. We present the I/O subsystem architecture, the end-to-end I/O solution space, their design considerations and our deployment experience.

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE; USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 1619016

Country of Publication:: United States

Language:: English

References (7)

The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems Vazhkudai, Sudharshan S.; de Supinski, Bronis R.; Bland, Arthur S. SC18: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2018.00055	conference	November 2018
Comparative I/O workload characterization of two leadership class storage clusters Gunasekaran, Raghul; Oral, Sarp; Hill, Jason Proceedings of the 10th Parallel Data Storage Workshop on - PDSW '15 https://doi.org/10.1145/2834976.2834985	conference	January 2015
Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems Zhu, Yue; Chowdhury, Fahim; Fu, Huansong 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) https://doi.org/10.1109/MASCOTS.2018.00023	conference	September 2018
Characterizing Deep-Learning I/O Workloads in TensorFlow Chien, Steven W. D.; Markidis, Stefano; Sishtla, Chaitanya Prasad 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS) https://doi.org/10.1109/PDSW-DISCS.2018.00011	conference	November 2018
Fast Parallel Algorithms for Short-Range Molecular Dynamics Plimpton, Steve Journal of Computational Physics, Vol. 117, Issue 1 https://doi.org/10.1006/jcph.1995.1039	journal	March 1995
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System Moody, Adam; Bronevetsky, Greg; Mohror, Kathryn 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.18	conference	November 2010
Provisioning a Multi-tiered Data Staging Area for Extreme-Scale Machines Prabhakar, Ramya; Vazhkudai, Sudharshan S.; Kim, Youngjae 2011 31st International Conference on Distributed Computing Systems (ICDCS) https://doi.org/10.1109/ICDCS.2011.33	conference	June 2011

Similar Records

Scaling the Summit: Deploying the World's Fastest Supercomputer

Conference · Sat Jun 01 00:00:00 EDT 2019 · OSTI ID:1561654

Announcing Supercomputer Summit

Multimedia · Tue Jun 28 00:00:00 EDT 2016 · OSTI ID:1259664

Strategies to Deploy and Scale Deep Learning on the Summit Supercomputer

Conference · Fri Nov 01 00:00:00 EDT 2019 · OSTI ID:1606652

End-to-end I/O portfolio for the summit supercomputing ecosystem

Citation Formats

References (7)

Similar Records

Related Subjects