Can I/O Variability Be Reduced on QoS-Less HPC Storage Systems?
Abstract
For a production high-performance computing (HPC) system, where storage devices are shared between multiple applications and managed in a best effort manner, I/O contention is often a major problem. In this paper, we propose a balanced messaging-based re-routing in conjunction with throttling at the middleware level. This work tackles two key challenges that have not been fully resolved in the past: whether I/O variability can be reduced on a QoS-less HPC storage system, and how to design a runtime scheduling system that can scale up to a large amount of cores. The proposed scheme uses a two-level messaging system to re-route I/O requests to a less congested storage location so that write performance is improved, while limiting the impact on read by throttling re-routing. An analytical model is derived to guide the setup of optimal throttling factor. We thoroughly analyze the virtual messaging layer overhead and explore whether the in-transit buffering is effective in managing I/O variability. Contrary to the intuition, in-transit buffer cannot completely solve the problem. It can reduce the absolute variability but not the relative variability. Here, the proposed scheme is verified against a synthetic benchmark as well as being used by production applications.
- Authors:
-
- New Jersey Inst. of Technology, Newark, NJ (United States)
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Temple Univ., Philadelphia, PA (United States)
- Publication Date:
- Research Org.:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
- Sponsoring Org.:
- USDOE Office of Science (SC)
- OSTI Identifier:
- 1559601
- Grant/Contract Number:
- AC05-00OR22725
- Resource Type:
- Accepted Manuscript
- Journal Name:
- IEEE Transactions on Computers
- Additional Journal Information:
- Journal Volume: 68; Journal Issue: 5; Journal ID: ISSN 0018-9340
- Publisher:
- IEEE
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; High-performance computing; storage; quality of service; variability
Citation Formats
Huang, Dan, Liu, Qing, Choi, Jong Youl, Podhorszki, Norbert, Klasky, Scott A., Logan, Jeremy, Ostrouchov, George, He, Xubin, and Wolf, Matthew D. Can I/O Variability Be Reduced on QoS-Less HPC Storage Systems?. United States: N. p., 2018.
Web. doi:10.1109/TC.2018.2881709.
Huang, Dan, Liu, Qing, Choi, Jong Youl, Podhorszki, Norbert, Klasky, Scott A., Logan, Jeremy, Ostrouchov, George, He, Xubin, & Wolf, Matthew D. Can I/O Variability Be Reduced on QoS-Less HPC Storage Systems?. United States. https://doi.org/10.1109/TC.2018.2881709
Huang, Dan, Liu, Qing, Choi, Jong Youl, Podhorszki, Norbert, Klasky, Scott A., Logan, Jeremy, Ostrouchov, George, He, Xubin, and Wolf, Matthew D. Mon .
"Can I/O Variability Be Reduced on QoS-Less HPC Storage Systems?". United States. https://doi.org/10.1109/TC.2018.2881709. https://www.osti.gov/servlets/purl/1559601.
@article{osti_1559601,
title = {Can I/O Variability Be Reduced on QoS-Less HPC Storage Systems?},
author = {Huang, Dan and Liu, Qing and Choi, Jong Youl and Podhorszki, Norbert and Klasky, Scott A. and Logan, Jeremy and Ostrouchov, George and He, Xubin and Wolf, Matthew D.},
abstractNote = {For a production high-performance computing (HPC) system, where storage devices are shared between multiple applications and managed in a best effort manner, I/O contention is often a major problem. In this paper, we propose a balanced messaging-based re-routing in conjunction with throttling at the middleware level. This work tackles two key challenges that have not been fully resolved in the past: whether I/O variability can be reduced on a QoS-less HPC storage system, and how to design a runtime scheduling system that can scale up to a large amount of cores. The proposed scheme uses a two-level messaging system to re-route I/O requests to a less congested storage location so that write performance is improved, while limiting the impact on read by throttling re-routing. An analytical model is derived to guide the setup of optimal throttling factor. We thoroughly analyze the virtual messaging layer overhead and explore whether the in-transit buffering is effective in managing I/O variability. Contrary to the intuition, in-transit buffer cannot completely solve the problem. It can reduce the absolute variability but not the relative variability. Here, the proposed scheme is verified against a synthetic benchmark as well as being used by production applications.},
doi = {10.1109/TC.2018.2881709},
journal = {IEEE Transactions on Computers},
number = 5,
volume = 68,
place = {United States},
year = {Mon Nov 19 00:00:00 EST 2018},
month = {Mon Nov 19 00:00:00 EST 2018}
}
Web of Science