skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems

Abstract

The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterative workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechanisms) that affect the performance of scientific workflows on HPC systems.

Authors:
 [1];  [1];  [1];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1379520
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Proceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016
Additional Journal Information:
Conference: 2016 16. IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016, Cartagena (Colombia), 16-19 May 2016
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Monitoring; Programming; Libraries; Syntactics; Arrays; Pipelines; Collaboration; Data Analysis; Scientific Workflows; High Performance Computing

Citation Formats

Hendrix, Valerie, Fox, James, Ghoshal, Devarshi, and Ramakrishnan, Lavanya. Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems. United States: N. p., 2016. Web. doi:10.1109/CCGrid.2016.54.
Hendrix, Valerie, Fox, James, Ghoshal, Devarshi, & Ramakrishnan, Lavanya. Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems. United States. doi:10.1109/CCGrid.2016.54.
Hendrix, Valerie, Fox, James, Ghoshal, Devarshi, and Ramakrishnan, Lavanya. Thu . "Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems". United States. doi:10.1109/CCGrid.2016.54. https://www.osti.gov/servlets/purl/1379520.
@article{osti_1379520,
title = {Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems},
author = {Hendrix, Valerie and Fox, James and Ghoshal, Devarshi and Ramakrishnan, Lavanya},
abstractNote = {The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterative workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechanisms) that affect the performance of scientific workflows on HPC systems.},
doi = {10.1109/CCGrid.2016.54},
journal = {Proceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016},
number = ,
volume = ,
place = {United States},
year = {Thu Jul 21 00:00:00 EDT 2016},
month = {Thu Jul 21 00:00:00 EDT 2016}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share: