skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Certification of Completion of Item 2 of ASC FY07 Level-2 Milestone ID #2380

Abstract

This report documents the completion of Item 2 of the three milestone deliverables that comprise Milestone ID 2380: Deploy selected Tri-Lab resource manager at LLNL and develop support model. Specifically: LLNL will integrate and support a commercial resource manager software product at LLNL to be used across the tri-lab HPC facilities.

Authors:
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
908136
Report Number(s):
UCRL-TR-229657
TRN: US200722%%496
DOE Contract Number:
W-7405-ENG-48
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; LAWRENCE LIVERMORE NATIONAL LABORATORY; RESOURCE MANAGEMENT; LABORATORY EQUIPMENT

Citation Formats

Lipari, D A. Certification of Completion of Item 2 of ASC FY07 Level-2 Milestone ID #2380. United States: N. p., 2007. Web. doi:10.2172/908136.
Lipari, D A. Certification of Completion of Item 2 of ASC FY07 Level-2 Milestone ID #2380. United States. doi:10.2172/908136.
Lipari, D A. Wed . "Certification of Completion of Item 2 of ASC FY07 Level-2 Milestone ID #2380". United States. doi:10.2172/908136. https://www.osti.gov/servlets/purl/908136.
@article{osti_908136,
title = {Certification of Completion of Item 2 of ASC FY07 Level-2 Milestone ID #2380},
author = {Lipari, D A},
abstractNote = {This report documents the completion of Item 2 of the three milestone deliverables that comprise Milestone ID 2380: Deploy selected Tri-Lab resource manager at LLNL and develop support model. Specifically: LLNL will integrate and support a commercial resource manager software product at LLNL to be used across the tri-lab HPC facilities.},
doi = {10.2172/908136},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Mar 28 00:00:00 EDT 2007},
month = {Wed Mar 28 00:00:00 EDT 2007}
}

Technical Report:

Save / Share:
  • This report documents the satisfaction of the completion criteria associated with ASC FY08 Milestone ID No.2933: 'Deploy Moab resource management services on BlueGene/L'. Specifically, this milestone represents LLNL efforts to enhance both SLURM and Moab to extend Moab's capabilities to schedule and manage BlueGene/L, and increases portability of user scripts between ASC systems. The completion criteria for the milestone are the following: (1) Batch jobs can be specified, submitted to Moab, scheduled and run on the BlueGene/L system; (2) Moab will be able to support the markedly increased scale in node count as well as the wiring geometry that ismore » unique to BlueGene/L; and (3) Moab will also prepare and report statistics of job CPU usage just as it does for the current systems it supports. This document presents the completion evidence for both of the stated milestone certification methods: Completion evidence for this milestone will be in the form of (1) documentation--a report that certifies that the completion criteria have been met; and (2) user hand-off. As the selected Tri-Lab workload manager, Moab was chosen to replace LCRM as the enterprise-wide scheduler across Livermore Computing (LC) systems. While LCRM/SLURM successfully scheduled jobs on BG/L, the effort to replace LCRM with Moab on BG/L represented a significant challenge. Moab is a commercial product developed and sold by Cluster Resources, Inc. (CRI). Moab receives the users batch job requests and dispatches these jobs to run on a specific cluster. SLURM is an open-source resource manager whose development is managed by members of the Integrated Computational Resource Management Group (ICRMG) within the Services and Development Division at LLNL. SLURM is responsible for launching and running jobs on an individual cluster. Replacing LCRM with Moab on BG/L required substantial changes to both Moab and SLURM. While the ICRMG could directly manage the SLURM development effort, the work to enhance Moab had to be done by Moab's vendor. Members of the ICRMG held many meetings with CRI developers to develop the design and specify the requirements for what Moab needed to do. Extensions to SLURM are used to run jobs on the BlueGene/L architecture. These extensions support the three dimensional network topology unique to BG/L. While BG/L geometry support was already in SLURM, enhancements were needed to provide backfill capability and answer 'will-run' queries from Moab. For its part, the Moab architecture needed to be modified to interact with SLURM in a more coordinated way. It needed enhancements to support SLURM's shorthand notation for representing thousands of compute nodes and report this information using Moab's existing status commands. The LCRM wrapper scripts that emulated LCRM commands also needed to be enhanced to support BG/L usage. The effort was successful as Moab 5.2.2 and SLURM 1.3 was installed on the 106496 node BG/L machine on May 21, 2008, and turned over to the users to run production.« less
  • This report describes the deployment and demonstration of the first phase of the I/O infrastructure for Purple. The report and the references herein are intended to certify the completion of the following Level 2 Milestone from the ASC FY04-05 Implementation Plan, due at the end of Quarter 4 in FY05. The milestone is defined as follows: ''External networking infrastructure installation and performance analysis will be completed for the initial delivery of Purple. The external networking infrastructure includes incorporation of a new 10 Gigabit Ethernet fabric linking the platform to the LLNL High Performance Storage System (HPSS) and other center equipment.more » The LLNL archive will be upgraded to HPSS Release 5.1 to support the requirements of the machine and performance analysis will be completed using the newly deployed I/O infrastructure. Demonstrated throughput to the archive for this infrastructure will be a minimum of 1.5GB/s with a target of 3GB/s. Since Purple delivery is not scheduled until late Q3, demonstration of these performance goals will use parts of Purple and/or an aggregate of other existing resources.''« less
  • There has been substantial development of the Lustre parallel filesystem prior to the configuration described below for this milestone. The initial Lustre filesystems that were deployed were directly connected to the cluster interconnect, i.e. Quadrics Elan3. That is, the clients (OSSes) and Meta-data Servers (MDS) were all directly connected to the cluster's internal high speed interconnect. This configuration serves a single cluster very well, but does not provide sharing of the filesystem among clusters. LLNL funded the development of high-efficiency ''portals router'' code by CFS (the company that develops Lustre) to enable us to move the Lustre servers to amore » GigE-connected network configuration, thus making it possible to connect to the servers from several clusters. With portals routing available, here is what changes: (1) another storage-only cluster is deployed to front the Lustre storage devices (these become the Lustre OSSes and MDS), (2) this ''Lustre cluster'' is attached via GigE connections to a large GigE switch/router cloud, (3) a small number of compute-cluster nodes are designated as ''gateway'' or ''portal router'' nodes, and (4) the portals router nodes are GigE-connected to the switch/router cloud. The Lustre configuration is then changed to reflect the new network paths. A typical example of this is a compute cluster and a related visualization cluster: the compute cluster produces the data (writes it to the Lustre filesystem), and the visualization cluster consumes some of the data (reads it from the Lustre filesystem). This process can be expanded by aggregating several collections of Lustre backend storage resources into one or more ''centralized'' Lustre filesystems, and then arranging to have several ''client'' clusters mount these centralized filesystems. The ''client clusters'' can be any combination of compute, visualization, archiving, or other types of cluster. This milestone demonstrates the operation and performance of a scaled-down version of such a large, centralized, shared Lustre filesystem concept.« less
  • This summary report describes data management and visualization activities in the Advanced Simulation and Computing (ASC) program at Lawrence Livermore National Laboratory (LLNL). The report covers the period from approximately October 2003 to June 2004 and describes activities within the Visual Interactive Environment for Weapons Simulation (VIEWS) ASC program element. This report and the references herein are intended to document the completion of the following Level 2 Milestone from the ASC FY04-05 Implementation Plan, due at the end of Quarter 3 in FY04:
  • In 2015, the three Department of Energy (DOE) National Laboratories that make up the Advanced Sci- enti c Computing (ASC) Program (Sandia, Lawrence Livermore, and Los Alamos) collaboratively explored performance portability programming environments in the context of several ASC co-design proxy applica- tions as part of a tri-lab L2 milestone executed by the co-design teams at each laboratory. The programming environments that were studied included Kokkos (developed at Sandia), RAJA (LLNL), and Legion (Stan- ford University). The proxy apps studied included: miniAero, LULESH, CoMD, Kripke, and SNAP. These programming models and proxy-apps are described herein. Each lab focused on amore » particular combination of abstractions and proxy apps, with the goal of assessing performance portability using those. Performance portability was determined by: a) the ability to run a single application source code on multiple advanced architectures, b) comparing runtime performance between \native" and \portable" implementations, and c) the degree to which these abstractions can improve programmer productivity by allowing non-portable implementation details to be hidden from the application developer. This report captures the work that was completed for this milestone, and outlines future co-design work to be performed by application developers, programming environment developers, compiler writers, and hardware vendors.« less