Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Programming Abstractions for Managing Workflows on Tiered Storage Systems

Journal Article · · ACM Transactions on Storage
DOI:https://doi.org/10.1145/3457119· OSTI ID:1898543
 [1];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Scientific workflows in High Performance Computing (HPC) environments are processing large amounts of data. The storage hierarchy on HPC systems is getting deeper, driven by new technologies (NVRAMs, SSDs, etc.) There is a need for new programming abstractions that allow users to seamlessly manage data at the workflow level on multi-tiered storage systems, and provide optimal workflow performance and use of storage resources. In previous work, we introduced a software architecture Managing Data on Tiered Storage for Scientific Workflows (MaDaTS) that used a Virtual Data Space (VDS) abstraction to hide the complexities of the underlying storage system while allowing users to control data management strategies. In this article, we detail the data-centric programming abstractions that allow users to manage a workflow around its data on the storage layer. The programming abstractions simplify data management for scientific workflows on multi-tiered storage systems, without affecting workflow performance or storage capacity. We measure the overheads and effectiveness introduced by the programming abstractions of MaDaTS. Our results show that these abstractions can optimally use the storage capacity in lesser capacity storage tiers, and simplify data management without adding any performance overheads.
Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1898543
Journal Information:
ACM Transactions on Storage, Journal Name: ACM Transactions on Storage Journal Issue: 4 Vol. 17; ISSN 1553-3077
Publisher:
Association for Computing Machinery (ACM)Copyright Statement
Country of Publication:
United States
Language:
English

References (9)

BitDew: A data management and distribution service with multi-protocol file transfer and metadata abstraction journal September 2009
Provisioning a Multi-tiered Data Staging Area for Extreme-Scale Machines conference June 2011
FlexIO: I/O Middleware for Location-Flexible Scientific Data Analytics
  • Zheng, Fang; Zou, Hongbo; Eisenhauer, Greg
  • 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2013.46
conference May 2013
Data Jockey: Automatic Data Management for HPC Multi-tiered Storage Systems conference May 2019
ExaPlan: Queueing-Based Data Placement and Provisioning for Large Tiered Storage Systems
  • Iliadis, Ilias; Jelitto, Jens; Kim, Yusik
  • 2015 IEEE 23rd International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems https://doi.org/10.1109/MASCOTS.2015.41
conference October 2015
Automated lookahead data migration in SSD-enabled multi-tiered storage systems conference May 2010
On the role of burst buffers in leadership-class storage systems conference April 2012
Optimizing center performance through coordinated data staging, scheduling and recovery conference January 2007
Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems journal January 2005

Figures / Tables (16)


Similar Records

MaDaTS: Managing Data on Tiered Storage for Scientific Workflows
Journal Article · Sun Sep 30 20:00:00 EDT 2018 · Journal of Open Source Software · OSTI ID:1582034

MaDaTS: Managing Data on Tiered Storage for Scientific Workflows
Journal Article · Sat Dec 31 23:00:00 EST 2016 · OSTI ID:1544386

Managing Data on Tiered Storage for Scientific Workflows (MaDaTS) v1.1.2
Software · Sun Apr 15 20:00:00 EDT 2018 · OSTI ID:code-11375