skip to main content

Title: Toward Improved Support for Loosely Coupled Large Scale Simulation Workflows

High-performance computing (HPC) workloads are increasingly leveraging loosely coupled large scale simula- tions. Unfortunately, most large-scale HPC platforms, including Cray/ALPS environments, are designed for the execution of long-running jobs based on coarse-grained launch capabilities (e.g., one MPI rank per core on all allocated compute nodes). This assumption limits capability-class workload campaigns that require large numbers of discrete or loosely coupled simulations, and where time-to-solution is an untenable pacing issue. This paper describes the challenges related to the support of fine-grained launch capabilities that are necessary for the execution of loosely coupled large scale simulations on Cray/ALPS platforms. More precisely, we present the details of an enhanced runtime system to support this use case, and report on initial results from early testing on systems at Oak Ridge National Laboratory.
 [1] ;  [1] ;  [1] ;  [1]
  1. ORNL
Publication Date:
OSTI Identifier:
DOE Contract Number:
Resource Type:
Resource Relation:
Conference: Annual Cray User Group, Lugano, Switzerland, 20140504, 20140508
Research Org:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org:
USDOE Office of Science (SC)
Country of Publication:
United States
Framework; Runtime; Ensemble computing.