I/O-aware bandwidth allocation for petascale computing systems
Journal Article
·
· Parallel Computing
In the Big Data era, the gap between the storage performance and an appli- cation's I/O requirement is increasing. I/O congestion caused by concurrent storage accesses from multiple applications is inevitable and severely harms the performance. Conventional approaches either focus on optimizing an ap- plication's access pattern individually or handle I/O requests on a low-level storage layer without any knowledge from the upper-level applications. In this paper, we present a novel I/O-aware bandwidth allocation framework to coordinate ongoing I/O requests on petascale computing systems. The motivation behind this innovation is that the resource management system has a holistic view of both the system state and jobs' activities and can dy- namically control the jobs' status or allocate resource on the y during their execution. We treat a job's I/O requests as periodical subjobs within its lifecycle and transform the I/O congestion issue into a classical scheduling problem. Based on this model, we propose a bandwidth management mech- anism as an extension to the existing scheduling system. We design several bandwidth allocation policies with different optimization objectives either on user-oriented metrics or system performance. We conduct extensive trace- based simulations using real job traces and I/O traces from a production IBM Blue Gene/Q system at Argonne National Laboratory. Experimental results demonstrate that our new design can improve job performance by more than 30%, as well as increasing system performance.
- Research Organization:
- Argonne National Laboratory (ANL)
- Sponsoring Organization:
- National Science Foundation (NSF); USDOE Office of Science
- DOE Contract Number:
- AC02-06CH11357
- OSTI ID:
- 1429892
- Journal Information:
- Parallel Computing, Journal Name: Parallel Computing Journal Issue: C Vol. 58; ISSN 0167-8191
- Publisher:
- Elsevier
- Country of Publication:
- United States
- Language:
- English
Similar Records
Optimizing Center Performance through Coordinated Data Staging, Scheduling and Recovery
iez: Resource Contention Aware Load Balancing for Large-Scale Parallel File Systems
Conference
·
Sun Dec 31 23:00:00 EST 2006
·
OSTI ID:1000413
iez: Resource Contention Aware Load Balancing for Large-Scale Parallel File Systems
Conference
·
Wed May 01 00:00:00 EDT 2019
·
OSTI ID:1559654