Timely Result-Data Offloading for Improved HPC Center Scratch Provisioning and Serviceability
Journal Article
·
· IEEE Transactions on Parallel and Distributed Systems
- Virginia Polytechnic Institute and State University (Virginia Tech)
- ORNL
Modern High-Performance Computing (HPC) centers are facing a data deluge from emerging scientific applications. Supporting large data entails a significant commitment of the highthroughput center storage system, scratch space. However, the scratch space is typically managed using simple purge policies, without sophisticated end-user data services to balance resource consumption and user serviceability. End-user data services such as offloading are performed using point-to-point transfers that are unable to reconcile center s purge and users delivery deadlines, unable to adapt to changing dynamics in the end-toend data path and are not fault-tolerant. Such inefficiencies can be prohibitive to sustaining high performance. In this paper, we address the above issues by designing a framework for the timely, decentralized offload of application result data. Our framework uses an overlay of user-specified intermediate and landmark sites to orchestrate a decentralized fault-tolerant delivery. We have implemented our techniques within a production job scheduler (PBS) and data transfer tool (BitTorrent). Our evaluation using both a real implementation and supercomputer job log-driven simulations show that: the offloading times can be significantly reduced (90.4% for a 5 GB data transfer); the exposure window can be minimized while also meeting center-user Service Level Agreements.
- Research Organization:
- Oak Ridge National Laboratory (ORNL)
- Sponsoring Organization:
- ORNL LDRD Director's R&D
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1020778
- Journal Information:
- IEEE Transactions on Parallel and Distributed Systems, Journal Name: IEEE Transactions on Parallel and Distributed Systems Journal Issue: 8 Vol. 22; ISSN 1045-9219
- Country of Publication:
- United States
- Language:
- English
Similar Records
CATCH: A Cloud-based Adaptive Data Transfer Service for HPC
/Scratch as a Cache: Rethinking HPC Center Scratch Storage
Reconciling Scratch Space Consumption, Exposure, and Volatility to Achieve Timely Staging of Job Input Data
Conference
·
Fri Dec 31 23:00:00 EST 2010
·
OSTI ID:1015023
/Scratch as a Cache: Rethinking HPC Center Scratch Storage
Conference
·
Mon Jun 01 00:00:00 EDT 2009
·
OSTI ID:1004447
Reconciling Scratch Space Consumption, Exposure, and Volatility to Achieve Timely Staging of Job Input Data
Conference
·
Thu Apr 01 00:00:00 EDT 2010
·
OSTI ID:985302