skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems

Abstract

The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterative workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechanisms) that affect the performance of scientific workflows on HPC systems.

Authors:
 [1];  [1];  [1];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1379520
Grant/Contract Number:
AC02-05CH11231
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Proceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016
Additional Journal Information:
Conference: 2016 16. IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016, Cartagena (Colombia), 16-19 May 2016
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Monitoring; Programming; Libraries; Syntactics; Arrays; Pipelines; Collaboration; Data Analysis; Scientific Workflows; High Performance Computing

Citation Formats

Hendrix, Valerie, Fox, James, Ghoshal, Devarshi, and Ramakrishnan, Lavanya. Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems. United States: N. p., 2016. Web. doi:10.1109/CCGrid.2016.54.
Hendrix, Valerie, Fox, James, Ghoshal, Devarshi, & Ramakrishnan, Lavanya. Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems. United States. doi:10.1109/CCGrid.2016.54.
Hendrix, Valerie, Fox, James, Ghoshal, Devarshi, and Ramakrishnan, Lavanya. 2016. "Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems". United States. doi:10.1109/CCGrid.2016.54. https://www.osti.gov/servlets/purl/1379520.
@article{osti_1379520,
title = {Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems},
author = {Hendrix, Valerie and Fox, James and Ghoshal, Devarshi and Ramakrishnan, Lavanya},
abstractNote = {The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterative workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechanisms) that affect the performance of scientific workflows on HPC systems.},
doi = {10.1109/CCGrid.2016.54},
journal = {Proceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016},
number = ,
volume = ,
place = {United States},
year = 2016,
month = 7
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:
  • Data-intensive scientific workflows are often modeled using a dataflow-oriented model. The simplicity of a dataflow model facilitates intuitive workflow design, analysis, and optimization. However, some amount of control-flow modeling is often necessary for engineering fault-tolerant, robust, and adaptive workflows. Modeling the control-flow using inherent dataflow constructs will quickly end up with a workflow that is hard to comprehend, reuse, and maintain. In this paper, we propose a context-aware architecture for scientific workflows. By incorporating contexts within a data-flow oriented scientific workflow system, we enable the development of context-aware scientific workflows without the need to use numerous low-level control-flow actors. Thismore » results in a workflow that is aware of its environment during execution with minimal user input and responds intelligently based on such awareness at runtime. A further advantage of our approach is that the defined contexts can be reused and shared across other workflows. We demonstrate our approach with two prototype implementation of context-aware actors in KEPLER.« less
  • Purpose: To extend a clinical Record and Verify (R&V) system to enable a safe and fast workflow for Plan-of-the-Day (PotD) adaptive treatments based on patient-specific plan libraries. Methods: Plan libraries for PotD adaptive treatments contain for each patient several pre-treatment generated treatment plans. They may be generated for various patient anatomies or CTV-PTV margins. For each fraction, a Cone Beam CT scan is acquired to support the selection of the plan that best fits the patient’s anatomy-of-the-day. To date, there are no commercial R&V systems that support PotD delivery strategies. Consequently, the clinical workflow requires many manual interventions. Moreover, multiplemore » scheduled plans have a high risk of excessive dose delivery. In this work we extended a commercial R&V system (MOSAIQ) to support PotD workflows using IQ-scripting. The PotD workflow was designed after extensive risk analysis of the manual procedure, and all identified risks were incorporated as logical checks. Results: All manual PotD activities were automated. The workflow first identifies if the patient is scheduled for PotD, then performs safety checks, and continues to treatment plan selection only if no issues were found. The user selects the plan to deliver from a list of candidate plans. After plan selection, the workflow makes the treatment fields of the selected plan available for delivery by adding them to the treatment calendar. Finally, control is returned to the R&V system to commence treatment. Additional logic was added to incorporate off-line changes such as updating the plan library. After extensive testing including treatment fraction interrupts and plan-library updates during the treatment course, the workflow is running successfully in a clinical pilot, in which 35 patients have been treated since October 2014. Conclusion: We have extended a commercial R&V system for improved safety and efficiency in library-based adaptive strategies enabling a wide-spread implementation of those strategies. This work was in part funded by a research grant of Elekta AB, Stockholm, Sweden.« less
  • We describe the design and implementation of a web-accessible scientific workflow system for environmental performance monitoring. This workflow environment integrates distributed automated data acquisition with server side data management and information visualization through flexible browser-based data access tools. Component technologies include a rich browser-based client, a back-end server for methodical data processing, user management, and result delivery, and third party applications which are invoked by the back-end using web services. This environment allows for reproducible, transparent result generation by a diverse user base, and provides a seamless integration between data selection, analysis applications, and result delivery. This workflow system hasmore » been implemented for several sites and monitoring systems with different degrees of complexity.« less
  • We describe the design and implementation of a web accessible scientific workflow system for environmental monitoring. This workflow environment integrates distributed, automated data acquisition with server side data management and information visualization through flexible browser based data access tools. Component technologies include a rich browser-based client (using dynamic Javascript and HTML/CSS) for data selection, a back-end server which uses PHP for data processing, user management, and result delivery, and third party applications which are invoked by the back-end using webservices. This environment allows for reproducible, transparent result generation by a diverse user base. It has been implemented for several monitoringmore » systems with different degrees of complexity.« less
  • A context-aware scientific workflow is a typical scientific workflow that is enhanced with context binding and awareness mechanisms. Context facilitates further configuration of the scientific workflow at runtime such that it is tuned to its environment during execution and responds intelligently based on such awareness without customized coding of the workflow. In this paper, we present a context annotation framework, which supports rapid development of context-aware scientific workflows. Context annotation enables a diverse type of actor in Kepler that may bind with different sensed environmental information as part of the actor’s regular data. Context-aware actors simplify the construction of scientificmore » workflows that require intricate knowledge in initializing and configuring a large number of parameters to cover all different execution conditions. This paper presents the motivation, system design, implementation, and usage of context annotation in relation to the Kepler scientific workflow system.« less