Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

MaDaTS: Managing Data on Tiered Storage for Scientific Workflows

Journal Article ·
 [1];  [1]
  1. Lawrence Berkeley National Lab, Berkeley, CA, USA
Scientific workflows are increasingly used in High Performance Computing (HPC) environments to manage complex simulation and analyses, often consuming and generating large amounts of data. However, workflow tools have limited support for managing the input, output and intermediate data. The data elements of a workflow are often managed by the user through scripts or other ad-hoc mechanisms. Technology advances for future HPC systems is redefining the memory and storage subsystem by introducing additional tiers to improve the I/O performance of data-intensive applications. These architectural changes introduce additional complexities to managing data for scientific workflows. Thus, we need to manage the scientific workflow data across the tiered storage system on HPC machines. In this paper, we present the design and implementation of MaDaTS (Managing Data on Tiered Storage for Scientific Workflows), a software architecture that manages data for scientific workflows. We introduce Virtual Data Space (VDS), an abstraction of the data in a workflow that hides the complexities of the underlying storage system while allowing users to control data management strategies. We evaluate the data management strategies with real scientific and synthetic workflows, and demonstrate the capabilities of MaDaTS. Our experiments demonstrate the flexibility, performance and scalability gains of MaDaTS as compared to the traditional approach of managing data in scientific workflows.
Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
Sponsoring Organization:
USDOE
OSTI ID:
1544386
Country of Publication:
United States
Language:
English

Similar Records

Programming Abstractions for Managing Workflows on Tiered Storage Systems
Journal Article · Sun Oct 24 20:00:00 EDT 2021 · ACM Transactions on Storage · OSTI ID:1898543

MaDaTS: Managing Data on Tiered Storage for Scientific Workflows
Journal Article · Sun Sep 30 20:00:00 EDT 2018 · Journal of Open Source Software · OSTI ID:1582034

Managing Data on Tiered Storage for Scientific Workflows (MaDaTS) v1.1.2
Software · Sun Apr 15 20:00:00 EDT 2018 · OSTI ID:code-11375

Related Subjects