MaDaTS: Managing Data on Tiered Storage for Scientific Workflows
- Lawrence Berkeley National Lab, Berkeley, CA, USA
Scientific workflows are increasingly used in High Performance Computing (HPC) environments to manage complex simulation and analyses, often consuming and generating large amounts of data. However, workflow tools have limited support for managing the input, output and intermediate data. The data elements of a workflow are often managed by the user through scripts or other ad-hoc mechanisms. Technology advances for future HPC systems is redefining the memory and storage subsystem by introducing additional tiers to improve the I/O performance of data-intensive applications. These architectural changes introduce additional complexities to managing data for scientific workflows. Thus, we need to manage the scientific workflow data across the tiered storage system on HPC machines. In this paper, we present the design and implementation of MaDaTS (Managing Data on Tiered Storage for Scientific Workflows), a software architecture that manages data for scientific workflows. We introduce Virtual Data Space (VDS), an abstraction of the data in a workflow that hides the complexities of the underlying storage system while allowing users to control data management strategies. We evaluate the data management strategies with real scientific and synthetic workflows, and demonstrate the capabilities of MaDaTS. Our experiments demonstrate the flexibility, performance and scalability gains of MaDaTS as compared to the traditional approach of managing data in scientific workflows.
- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
- Sponsoring Organization:
- USDOE
- OSTI ID:
- 1544386
- Country of Publication:
- United States
- Language:
- English
Similar Records
Programming Abstractions for Managing Workflows on Tiered Storage Systems
MaDaTS: Managing Data on Tiered Storage for Scientific Workflows
Managing Data on Tiered Storage for Scientific Workflows (MaDaTS) v1.1.2
Journal Article
·
Sun Oct 24 20:00:00 EDT 2021
· ACM Transactions on Storage
·
OSTI ID:1898543
MaDaTS: Managing Data on Tiered Storage for Scientific Workflows
Journal Article
·
Sun Sep 30 20:00:00 EDT 2018
· Journal of Open Source Software
·
OSTI ID:1582034
Managing Data on Tiered Storage for Scientific Workflows (MaDaTS) v1.1.2
Software
·
Sun Apr 15 20:00:00 EDT 2018
·
OSTI ID:code-11375