skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: iez: Resource Contention Aware Load Balancing for Large-Scale Parallel File Systems

Abstract

Parallel I/O performance is crucial to sustaining scientific applications on large-scale High-Performance Computing (HPC) systems. However, I/O load imbalance in the underlying distributed and shared storage systems can significantly reduce overall application performance. There are two conflicting challenges to mitigate this load imbalance: (i) optimizing systemwide data placement to maximize the bandwidth advantages of distributed storage servers, i.e., allocating I/O resources efficiently across applications and job runs; and (ii) optimizing client-centric data movement to minimize I/O load request latency between clients and servers, i.e., allocating I/O resources efficiently in service to a single application and job run. Moreover, existing approaches that require application changes limit wide-spread adoption in commercial or proprietary deployments. We propose iez, an “end-to-end control plane” where clients transparently and adaptively write to a set of selected I/O servers to achieve balanced data placement. Our control plane leverages realtime load information for distributed storage server global data placement while our design model leverages trace-based optimization techniques to minimize I/O load request latency between clients and servers. We evaluate our proposed system on an experimental cluster for two common use cases: synthetic I/O benchmark IOR for large sequential writes and a scientific application I/O kernel, HACC-I/O. Results showmore » read and write performance improvements of up to 34% and 32%, respectively, compared to the state of the art.« less

Authors:
 [1];  [1];  [2]; ORCiD logo [3]; ORCiD logo [3];  [4];  [4];  [1]
  1. Virginia Polytechnic Institute and State University
  2. Heidelberg University, Germany
  3. ORNL
  4. Virginia Tech, Blacksburg, VA
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1559654
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: 33rd IEEE International Parallel & Distributed Processing Symposium (IPDPS 2019) - Rio de Janeiro, , Brazil - 5/20/2019 4:00:00 AM-5/24/2019 4:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Wadhwa, Bharti, Paul, Arnab K., Neuwirth, Sarah, Wang, Feiyi, Oral, H, Butt, Ali R., Cameron, Kirk W., and Bernard, Jon. iez: Resource Contention Aware Load Balancing for Large-Scale Parallel File Systems. United States: N. p., 2019. Web. doi:10.1109/IPDPS.2019.00070.
Wadhwa, Bharti, Paul, Arnab K., Neuwirth, Sarah, Wang, Feiyi, Oral, H, Butt, Ali R., Cameron, Kirk W., & Bernard, Jon. iez: Resource Contention Aware Load Balancing for Large-Scale Parallel File Systems. United States. doi:10.1109/IPDPS.2019.00070.
Wadhwa, Bharti, Paul, Arnab K., Neuwirth, Sarah, Wang, Feiyi, Oral, H, Butt, Ali R., Cameron, Kirk W., and Bernard, Jon. Wed . "iez: Resource Contention Aware Load Balancing for Large-Scale Parallel File Systems". United States. doi:10.1109/IPDPS.2019.00070. https://www.osti.gov/servlets/purl/1559654.
@article{osti_1559654,
title = {iez: Resource Contention Aware Load Balancing for Large-Scale Parallel File Systems},
author = {Wadhwa, Bharti and Paul, Arnab K. and Neuwirth, Sarah and Wang, Feiyi and Oral, H and Butt, Ali R. and Cameron, Kirk W. and Bernard, Jon},
abstractNote = {Parallel I/O performance is crucial to sustaining scientific applications on large-scale High-Performance Computing (HPC) systems. However, I/O load imbalance in the underlying distributed and shared storage systems can significantly reduce overall application performance. There are two conflicting challenges to mitigate this load imbalance: (i) optimizing systemwide data placement to maximize the bandwidth advantages of distributed storage servers, i.e., allocating I/O resources efficiently across applications and job runs; and (ii) optimizing client-centric data movement to minimize I/O load request latency between clients and servers, i.e., allocating I/O resources efficiently in service to a single application and job run. Moreover, existing approaches that require application changes limit wide-spread adoption in commercial or proprietary deployments. We propose iez, an “end-to-end control plane” where clients transparently and adaptively write to a set of selected I/O servers to achieve balanced data placement. Our control plane leverages realtime load information for distributed storage server global data placement while our design model leverages trace-based optimization techniques to minimize I/O load request latency between clients and servers. We evaluate our proposed system on an experimental cluster for two common use cases: synthetic I/O benchmark IOR for large sequential writes and a scientific application I/O kernel, HACC-I/O. Results show read and write performance improvements of up to 34% and 32%, respectively, compared to the state of the art.},
doi = {10.1109/IPDPS.2019.00070},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {5}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: