skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Build and Execute Environment

Software ·
OSTI ID:1356983

At exascale, the challenge becomes to develop applications that run at scale and use exascale platforms reliably, efficiently, and flexibly. Workflows become much more complex because they must seamlessly integrate simulation and data analytics. They must include down-sampling, post-processing, feature extraction, and visualization. Power and data transfer limitations require these analysis tasks to be run in-situ or in-transit. We expect successful workflows will comprise multiple linked simulations along with tens of analysis routines. Users will have limited development time at scale and, therefore, must have rich tools to develop, debug, test, and deploy applications. At this scale, successful workflows will compose linked computations from an assortment of reliable, well-defined computation elements, ones that can come and go as required, based on the needs of the workflow over time. We propose a novel framework that utilizes both virtual machines (VMs) and software containers to create a workflow system that establishes a uniform build and execution environment (BEE) beyond the capabilities of current systems. In this environment, applications will run reliably and repeatably across heterogeneous hardware and software. Containers, both commercial (Docker and Rocket) and open-source (LXC and LXD), define a runtime that isolates all software dependencies from the machine operating system. Workflows may contain multiple containers that run different operating systems, different software, and even different versions of the same software. We will run containers in open-source virtual machines (KVM) and emulators (QEMU) so that workflows run on any machine entirely in user-space. On this platform of containers and virtual machines, we will deliver workflow software that provides services, including repeatable execution, provenance, checkpointing, and future proofing. We will capture provenance about how containers were launched and how they interact to annotate workflows for repeatable and partial re-execution. We will coordinate the physical snapshots of virtual machines with parallel programming constructs, such as barriers, to automate checkpoint and restart. We will also integrate with HPC-specific container runtimes to gain access to accelerators and other specialized hardware to preserve native performance. Containers will link development to continuous integration. When application developers check code in, it will automatically be tested on a suite of different software and hardware architectures.

Short Name / Acronym:
BEE; 005280WKSTN00
Project Type:
Open Source under the BSD License.
Site Accession Number:
C17056
Version:
00
Programming Language(s):
Medium: X; OS: Linux 2.6
Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE
Contributing Organization:
Los Alamos National Laboratory (LANL)
DOE Contract Number:
AC52-06NA25396
OSTI ID:
1356983
Country of Origin:
United States

Similar Records

Review of Enabling Technologies to Facilitate Secure Compute Customization
Technical Report · Mon Dec 01 00:00:00 EST 2014 · OSTI ID:1356983

Enabling Diverse Software Stacks on Supercomputers using High Performance Virtual Clusters.
Technical Report · Mon May 01 00:00:00 EDT 2017 · OSTI ID:1356983

Resiliency in numerical algorithm design for extreme scale simulations
Journal Article · Fri Dec 10 00:00:00 EST 2021 · International Journal of High Performance Computing Applications · OSTI ID:1356983

Related Subjects