skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: TRIO: Burst Buffer Based I/O Orchestration

Abstract

The growing computing power on leadership HPC systems is often accompanied by ever-escalating failure rates. Checkpointing is a common defensive mechanism used by scientific applications for failure recovery. However, directly writing the large and bursty checkpointing dataset to parallel filesystem can incur significant I/O contention on storage servers. Such contention in turn degrades the raw bandwidth utilization of storage servers and prolongs the average job I/O time of concurrent applications. Recently burst buffer has been proposed as an intermediate layer to absorb the bursty I/O traffic from compute nodes to storage backend. But an I/O orchestration mechanism is still desired to efficiently move checkpointing data from bursty buffers to storage backend. In this paper, we propose a burst buffer based I/O orchestration framework, named TRIO, to intercept and reshape the bursty writes for better sequential write traffic to storage severs. Meanwhile, TRIO coordinates the flushing orders among concurrent burst buffers to alleviate the contention on storage server bandwidth. Our experimental results reveal that TRIO can deliver 30.5% higher bandwidth and reduce the average job I/O time by 37% on average for data-intensive applications in various checkpointing scenarios.

Authors:
 [1];  [2];  [1];  [1];  [1]
  1. Auburn University
  2. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org.:
USDOE
OSTI Identifier:
1265517
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: IEEE Cluster, Chicago, IL, USA, 20150908, 20150908
Country of Publication:
United States
Language:
English

Citation Formats

Wang, Teng, Oral, H Sarp, Pritchard, Michael, Wang, Bin, and Yu, Weikuan. TRIO: Burst Buffer Based I/O Orchestration. United States: N. p., 2015. Web.
Wang, Teng, Oral, H Sarp, Pritchard, Michael, Wang, Bin, & Yu, Weikuan. TRIO: Burst Buffer Based I/O Orchestration. United States.
Wang, Teng, Oral, H Sarp, Pritchard, Michael, Wang, Bin, and Yu, Weikuan. Thu . "TRIO: Burst Buffer Based I/O Orchestration". United States. doi:. https://www.osti.gov/servlets/purl/1265517.
@article{osti_1265517,
title = {TRIO: Burst Buffer Based I/O Orchestration},
author = {Wang, Teng and Oral, H Sarp and Pritchard, Michael and Wang, Bin and Yu, Weikuan},
abstractNote = {The growing computing power on leadership HPC systems is often accompanied by ever-escalating failure rates. Checkpointing is a common defensive mechanism used by scientific applications for failure recovery. However, directly writing the large and bursty checkpointing dataset to parallel filesystem can incur significant I/O contention on storage servers. Such contention in turn degrades the raw bandwidth utilization of storage servers and prolongs the average job I/O time of concurrent applications. Recently burst buffer has been proposed as an intermediate layer to absorb the bursty I/O traffic from compute nodes to storage backend. But an I/O orchestration mechanism is still desired to efficiently move checkpointing data from bursty buffers to storage backend. In this paper, we propose a burst buffer based I/O orchestration framework, named TRIO, to intercept and reshape the bursty writes for better sequential write traffic to storage severs. Meanwhile, TRIO coordinates the flushing orders among concurrent burst buffers to alleviate the contention on storage server bandwidth. Our experimental results reveal that TRIO can deliver 30.5% higher bandwidth and reduce the average job I/O time by 37% on average for data-intensive applications in various checkpointing scenarios.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Jan 01 00:00:00 EST 2015},
month = {Thu Jan 01 00:00:00 EST 2015}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: