skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Certification of Completion of ASC FY08 Level-2 Milestone ID #2933

Abstract

This report documents the satisfaction of the completion criteria associated with ASC FY08 Milestone ID No.2933: 'Deploy Moab resource management services on BlueGene/L'. Specifically, this milestone represents LLNL efforts to enhance both SLURM and Moab to extend Moab's capabilities to schedule and manage BlueGene/L, and increases portability of user scripts between ASC systems. The completion criteria for the milestone are the following: (1) Batch jobs can be specified, submitted to Moab, scheduled and run on the BlueGene/L system; (2) Moab will be able to support the markedly increased scale in node count as well as the wiring geometry that is unique to BlueGene/L; and (3) Moab will also prepare and report statistics of job CPU usage just as it does for the current systems it supports. This document presents the completion evidence for both of the stated milestone certification methods: Completion evidence for this milestone will be in the form of (1) documentation--a report that certifies that the completion criteria have been met; and (2) user hand-off. As the selected Tri-Lab workload manager, Moab was chosen to replace LCRM as the enterprise-wide scheduler across Livermore Computing (LC) systems. While LCRM/SLURM successfully scheduled jobs on BG/L, the effort to replace LCRMmore » with Moab on BG/L represented a significant challenge. Moab is a commercial product developed and sold by Cluster Resources, Inc. (CRI). Moab receives the users batch job requests and dispatches these jobs to run on a specific cluster. SLURM is an open-source resource manager whose development is managed by members of the Integrated Computational Resource Management Group (ICRMG) within the Services and Development Division at LLNL. SLURM is responsible for launching and running jobs on an individual cluster. Replacing LCRM with Moab on BG/L required substantial changes to both Moab and SLURM. While the ICRMG could directly manage the SLURM development effort, the work to enhance Moab had to be done by Moab's vendor. Members of the ICRMG held many meetings with CRI developers to develop the design and specify the requirements for what Moab needed to do. Extensions to SLURM are used to run jobs on the BlueGene/L architecture. These extensions support the three dimensional network topology unique to BG/L. While BG/L geometry support was already in SLURM, enhancements were needed to provide backfill capability and answer 'will-run' queries from Moab. For its part, the Moab architecture needed to be modified to interact with SLURM in a more coordinated way. It needed enhancements to support SLURM's shorthand notation for representing thousands of compute nodes and report this information using Moab's existing status commands. The LCRM wrapper scripts that emulated LCRM commands also needed to be enhanced to support BG/L usage. The effort was successful as Moab 5.2.2 and SLURM 1.3 was installed on the 106496 node BG/L machine on May 21, 2008, and turned over to the users to run production.« less

Authors:
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
945523
Report Number(s):
LLNL-TR-404701
TRN: US200904%%74
DOE Contract Number:  
W-7405-ENG-48
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS; ARCHITECTURE; DESIGN; GEOMETRY; LAUNCHING; LAWRENCE LIVERMORE NATIONAL LABORATORY; PRODUCTION; RESOURCE MANAGEMENT; SCHEDULES; STATISTICS; TOPOLOGY

Citation Formats

Lipari, D A. Certification of Completion of ASC FY08 Level-2 Milestone ID #2933. United States: N. p., 2008. Web. doi:10.2172/945523.
Lipari, D A. Certification of Completion of ASC FY08 Level-2 Milestone ID #2933. United States. doi:10.2172/945523.
Lipari, D A. Thu . "Certification of Completion of ASC FY08 Level-2 Milestone ID #2933". United States. doi:10.2172/945523. https://www.osti.gov/servlets/purl/945523.
@article{osti_945523,
title = {Certification of Completion of ASC FY08 Level-2 Milestone ID #2933},
author = {Lipari, D A},
abstractNote = {This report documents the satisfaction of the completion criteria associated with ASC FY08 Milestone ID No.2933: 'Deploy Moab resource management services on BlueGene/L'. Specifically, this milestone represents LLNL efforts to enhance both SLURM and Moab to extend Moab's capabilities to schedule and manage BlueGene/L, and increases portability of user scripts between ASC systems. The completion criteria for the milestone are the following: (1) Batch jobs can be specified, submitted to Moab, scheduled and run on the BlueGene/L system; (2) Moab will be able to support the markedly increased scale in node count as well as the wiring geometry that is unique to BlueGene/L; and (3) Moab will also prepare and report statistics of job CPU usage just as it does for the current systems it supports. This document presents the completion evidence for both of the stated milestone certification methods: Completion evidence for this milestone will be in the form of (1) documentation--a report that certifies that the completion criteria have been met; and (2) user hand-off. As the selected Tri-Lab workload manager, Moab was chosen to replace LCRM as the enterprise-wide scheduler across Livermore Computing (LC) systems. While LCRM/SLURM successfully scheduled jobs on BG/L, the effort to replace LCRM with Moab on BG/L represented a significant challenge. Moab is a commercial product developed and sold by Cluster Resources, Inc. (CRI). Moab receives the users batch job requests and dispatches these jobs to run on a specific cluster. SLURM is an open-source resource manager whose development is managed by members of the Integrated Computational Resource Management Group (ICRMG) within the Services and Development Division at LLNL. SLURM is responsible for launching and running jobs on an individual cluster. Replacing LCRM with Moab on BG/L required substantial changes to both Moab and SLURM. While the ICRMG could directly manage the SLURM development effort, the work to enhance Moab had to be done by Moab's vendor. Members of the ICRMG held many meetings with CRI developers to develop the design and specify the requirements for what Moab needed to do. Extensions to SLURM are used to run jobs on the BlueGene/L architecture. These extensions support the three dimensional network topology unique to BG/L. While BG/L geometry support was already in SLURM, enhancements were needed to provide backfill capability and answer 'will-run' queries from Moab. For its part, the Moab architecture needed to be modified to interact with SLURM in a more coordinated way. It needed enhancements to support SLURM's shorthand notation for representing thousands of compute nodes and report this information using Moab's existing status commands. The LCRM wrapper scripts that emulated LCRM commands also needed to be enhanced to support BG/L usage. The effort was successful as Moab 5.2.2 and SLURM 1.3 was installed on the 106496 node BG/L machine on May 21, 2008, and turned over to the users to run production.},
doi = {10.2172/945523},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2008},
month = {6}
}

Technical Report:

Save / Share: