Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System

Conference ·
With the increase of the scale and intensity of the parallel I/O workloads generated by those scientific applications running on high performance computing facilities, understanding the I/O dynamics, especially the root cause of the I/O performance variability and degradation in HPC environment, have become extremely critical to the HPC community. In this paper, we run extensive I/O measuring tests on a production leadership-class storage system to capture the performance variabilities of large-scale parallel I/O. Analyzing these results and its statistic correlation revealed some valuable insights into the characteristics of the storage system and the root cause of I/O performance variability. Further, we leverage these findings and propose an I/O middleware design refactoring which can improve the performance of the parallel I/O by optimizing the data striping and placement. Our preliminary evaluation results demonstrate the proposed approach can reduce the average per-process write latency by at least 80% and the maximum per-process write latency by at least 20%.
Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE; USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1474694
Country of Publication:
United States
Language:
English

References (14)

Characterizing output bottlenecks in a supercomputer
  • Xie, Bing; Chase, Jeffrey; Dillow, David
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.28
conference November 2012
Comparative I/O workload characterization of two leadership class storage clusters conference January 2015
A multi-level approach for understanding I/O activity in HPC applications conference September 2013
24/7 Characterization of petascale I/O workloads conference August 2009
Managing Variability in the IO Performance of Petascale Storage Systems
  • Lofstead, Jay; Zheng, Fang; Liu, Qing
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.32
conference November 2010
AN OVERVIEW OF THE OMNeT++ SIMULATION ENVIRONMENT
  • Varga, András; Hornig, Rudolf
  • 1st International ICST Conference on Simulation Tools and Techniques for Communications, Networks and Systems, Proceedings of the First International ICST Conference on Simulation Tools and Techniques for Communications Networks and Systems https://doi.org/10.4108/ICST.SIMUTOOLS2008.3027
conference January 2008
I/O performance challenges at leadership scale conference January 2009
Modeling a Leadership-Scale Storage System book January 2012
The Gemini System Interconnect
  • Alverson, Robert; Roweth, Duncan; Kaplan, Larry
  • 2010 IEEE 18th Annual Symposium on High-Performance Interconnects (HOTI), 2010 18th IEEE Symposium on High Performance Interconnects https://doi.org/10.1109/HOTI.2010.23
conference August 2010
A Multiplatform Study of I/O Behavior on Petascale Supercomputers
  • Luu, Huong; Winslett, Marianne; Gropp, William
  • Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '15 https://doi.org/10.1145/2749246.2749269
conference January 2015
New techniques for simulating high performance MPI applications on large storage networks journal March 2009
Towards Exploring Data-Intensive Scientific Applications at Extreme Scales through Systems and Simulations journal June 2016
Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems
  • Oral, Sarp; Simmons, James; Hill, Jason
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.23
conference November 2014
Heavy-tailed distribution of parallel I/O system response time conference January 2015

Similar Records

Related Subjects