Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

A Job Sizing Strategy for High-Throughput Scientific Workflows

Journal Article · · IEEE Transactions on Parallel and Distributed Systems
The user of a computing facility must make a critical decision when submitting jobs for execution: how many resources (such as cores, memory, and disk) should be requested for each job? If the request is too small, the job may fail due to resource exhaustion; if the request is too big, the job may succeed, but resources will be wasted. This decision is especially important when running hundreds of thousands of jobs in a high throughput workflow, which may exhibit complex, long tailed distributions of resource consumption. In this paper, we present a strategy for solving the job sizing problem: (1) applications are monitored and measured in user-space as they run; (2) the resource usage is collected into an online archive; and (3) jobs are automatically sized according to historical data in order to maximize throughput or minimize waste. We evaluate the solution analytically, and present case studies of applying the technique to high throughput physics and bioinformatics workflows consisting of hundreds of thousands of jobs, demonstrating an increase in throughput of 10-400 percent compared to naive approaches.
Research Organization:
Argonne National Laboratory (ANL)
Sponsoring Organization:
USDOE Office of Science - Office of Basic Energy Sciences
DOE Contract Number:
AC02-06CH11357
OSTI ID:
1472078
Journal Information:
IEEE Transactions on Parallel and Distributed Systems, Journal Name: IEEE Transactions on Parallel and Distributed Systems Journal Issue: 2 Vol. 29; ISSN 1045-9219
Publisher:
IEEE
Country of Publication:
United States
Language:
English

Similar Records

Managing genomic variant calling workflows with Swift/T
Journal Article · Mon Jul 08 20:00:00 EDT 2019 · PLoS ONE · OSTI ID:1627877

CMS Workflow Execution using Intelligent Job Scheduling and Data Access Strategies
Journal Article · Tue Jan 31 23:00:00 EST 2012 · IEEE Trans.Nucl.Sci. · OSTI ID:1560853

Accelerating Scientific Workflows on HPC Platforms with In Situ Processing
Conference · Fri Dec 31 23:00:00 EST 2021 · OSTI ID:1888792

Related Subjects