Build and Execute Environment
Software
·
OSTI ID:1356983
- LANL
At exascale, the challenge becomes to develop applications that run at scale and use exascale platforms reliably, efficiently, and flexibly. Workflows become much more complex because they must seamlessly integrate simulation and data analytics. They must include down-sampling, post-processing, feature extraction, and visualization. Power and data transfer limitations require these analysis tasks to be run in-situ or in-transit. We expect successful workflows will comprise multiple linked simulations along with tens of analysis routines. Users will have limited development time at scale and, therefore, must have rich tools to develop, debug, test, and deploy applications. At this scale, successful workflows will compose linked computations from an assortment of reliable, well-defined computation elements, ones that can come and go as required, based on the needs of the workflow over time. We propose a novel framework that utilizes both virtual machines (VMs) and software containers to create a workflow system that establishes a uniform build and execution environment (BEE) beyond the capabilities of current systems. In this environment, applications will run reliably and repeatably across heterogeneous hardware and software. Containers, both commercial (Docker and Rocket) and open-source (LXC and LXD), define a runtime that isolates all software dependencies from the machine operating system. Workflows may contain multiple containers that run different operating systems, different software, and even different versions of the same software. We will run containers in open-source virtual machines (KVM) and emulators (QEMU) so that workflows run on any machine entirely in user-space. On this platform of containers and virtual machines, we will deliver workflow software that provides services, including repeatable execution, provenance, checkpointing, and future proofing. We will capture provenance about how containers were launched and how they interact to annotate workflows for repeatable and partial re-execution. We will coordinate the physical snapshots of virtual machines with parallel programming constructs, such as barriers, to automate checkpoint and restart. We will also integrate with HPC-specific container runtimes to gain access to accelerators and other specialized hardware to preserve native performance. Containers will link development to continuous integration. When application developers check code in, it will automatically be tested on a suite of different software and hardware architectures.
- Short Name / Acronym:
- BEE; 005280WKSTN00
- Project Type:
- Open Source under the BSD License.
- Site Accession Number:
- C17056
- Version:
- 00
- Programming Language(s):
- Medium: X; OS: Linux 2.6
- Research Organization:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- USDOE
- Contributing Organization:
- Los Alamos National Laboratory (LANL)
- DOE Contract Number:
- AC52-06NA25396
- OSTI ID:
- 1356983
- Country of Origin:
- United States
Similar Records
Build and Execute Environment
Review of Enabling Technologies to Facilitate Secure Compute Customization
FOX: A Fault-Oblivious Extreme-Scale Execution Environment Boston University Final Report Project Number: DE-SC0005365
Software
·
Sat May 13 20:00:00 EDT 2017
·
OSTI ID:code-5319
Review of Enabling Technologies to Facilitate Secure Compute Customization
Technical Report
·
Sun Nov 30 23:00:00 EST 2014
·
OSTI ID:1195817
FOX: A Fault-Oblivious Extreme-Scale Execution Environment Boston University Final Report Project Number: DE-SC0005365
Technical Report
·
Sun Mar 17 00:00:00 EDT 2013
·
OSTI ID:1123493