skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Certification of Completion of Level-2 Milestone 461: Deploy First Phase of I/O Infrastructure for Purple

Abstract

This report describes the deployment and demonstration of the first phase of the I/O infrastructure for Purple. The report and the references herein are intended to certify the completion of the following Level 2 Milestone from the ASC FY04-05 Implementation Plan, due at the end of Quarter 4 in FY05. The milestone is defined as follows: ''External networking infrastructure installation and performance analysis will be completed for the initial delivery of Purple. The external networking infrastructure includes incorporation of a new 10 Gigabit Ethernet fabric linking the platform to the LLNL High Performance Storage System (HPSS) and other center equipment. The LLNL archive will be upgraded to HPSS Release 5.1 to support the requirements of the machine and performance analysis will be completed using the newly deployed I/O infrastructure. Demonstrated throughput to the archive for this infrastructure will be a minimum of 1.5GB/s with a target of 3GB/s. Since Purple delivery is not scheduled until late Q3, demonstration of these performance goals will use parts of Purple and/or an aggregate of other existing resources.''

Authors:
;
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
888607
Report Number(s):
UCRL-TR-217288
TRN: US200618%%438
DOE Contract Number:
W-7405-ENG-48
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; IMPLEMENTATION; LAWRENCE LIVERMORE NATIONAL LABORATORY; PERFORMANCE; STORAGE; TARGETS

Citation Formats

Gary, M, and Wiltzius, D. Certification of Completion of Level-2 Milestone 461: Deploy First Phase of I/O Infrastructure for Purple. United States: N. p., 2005. Web. doi:10.2172/888607.
Gary, M, & Wiltzius, D. Certification of Completion of Level-2 Milestone 461: Deploy First Phase of I/O Infrastructure for Purple. United States. doi:10.2172/888607.
Gary, M, and Wiltzius, D. Thu . "Certification of Completion of Level-2 Milestone 461: Deploy First Phase of I/O Infrastructure for Purple". United States. doi:10.2172/888607. https://www.osti.gov/servlets/purl/888607.
@article{osti_888607,
title = {Certification of Completion of Level-2 Milestone 461: Deploy First Phase of I/O Infrastructure for Purple},
author = {Gary, M and Wiltzius, D},
abstractNote = {This report describes the deployment and demonstration of the first phase of the I/O infrastructure for Purple. The report and the references herein are intended to certify the completion of the following Level 2 Milestone from the ASC FY04-05 Implementation Plan, due at the end of Quarter 4 in FY05. The milestone is defined as follows: ''External networking infrastructure installation and performance analysis will be completed for the initial delivery of Purple. The external networking infrastructure includes incorporation of a new 10 Gigabit Ethernet fabric linking the platform to the LLNL High Performance Storage System (HPSS) and other center equipment. The LLNL archive will be upgraded to HPSS Release 5.1 to support the requirements of the machine and performance analysis will be completed using the newly deployed I/O infrastructure. Demonstrated throughput to the archive for this infrastructure will be a minimum of 1.5GB/s with a target of 3GB/s. Since Purple delivery is not scheduled until late Q3, demonstration of these performance goals will use parts of Purple and/or an aggregate of other existing resources.''},
doi = {10.2172/888607},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Nov 17 00:00:00 EST 2005},
month = {Thu Nov 17 00:00:00 EST 2005}
}

Technical Report:

Save / Share:
  • This summary report describes data management and visualization activities in the Advanced Simulation and Computing (ASC) program at Lawrence Livermore National Laboratory (LLNL). The report covers the period from approximately October 2003 to June 2004 and describes activities within the Visual Interactive Environment for Weapons Simulation (VIEWS) ASC program element. This report and the references herein are intended to document the completion of the following Level 2 Milestone from the ASC FY04-05 Implementation Plan, due at the end of Quarter 3 in FY04:
  • There has been substantial development of the Lustre parallel filesystem prior to the configuration described below for this milestone. The initial Lustre filesystems that were deployed were directly connected to the cluster interconnect, i.e. Quadrics Elan3. That is, the clients (OSSes) and Meta-data Servers (MDS) were all directly connected to the cluster's internal high speed interconnect. This configuration serves a single cluster very well, but does not provide sharing of the filesystem among clusters. LLNL funded the development of high-efficiency ''portals router'' code by CFS (the company that develops Lustre) to enable us to move the Lustre servers to amore » GigE-connected network configuration, thus making it possible to connect to the servers from several clusters. With portals routing available, here is what changes: (1) another storage-only cluster is deployed to front the Lustre storage devices (these become the Lustre OSSes and MDS), (2) this ''Lustre cluster'' is attached via GigE connections to a large GigE switch/router cloud, (3) a small number of compute-cluster nodes are designated as ''gateway'' or ''portal router'' nodes, and (4) the portals router nodes are GigE-connected to the switch/router cloud. The Lustre configuration is then changed to reflect the new network paths. A typical example of this is a compute cluster and a related visualization cluster: the compute cluster produces the data (writes it to the Lustre filesystem), and the visualization cluster consumes some of the data (reads it from the Lustre filesystem). This process can be expanded by aggregating several collections of Lustre backend storage resources into one or more ''centralized'' Lustre filesystems, and then arranging to have several ''client'' clusters mount these centralized filesystems. The ''client clusters'' can be any combination of compute, visualization, archiving, or other types of cluster. This milestone demonstrates the operation and performance of a scaled-down version of such a large, centralized, shared Lustre filesystem concept.« less
  • This report documents the completion of Item 2 of the three milestone deliverables that comprise Milestone ID 2380: Deploy selected Tri-Lab resource manager at LLNL and develop support model. Specifically: LLNL will integrate and support a commercial resource manager software product at LLNL to be used across the tri-lab HPC facilities.
  • On July 7th 2006, the Purple Level-1 Review Committee convened and was presented with evidence of the completion of Level-2 Milestone 461 (Deploy First Phase of I/O Infrastructure for Purple) which was performed in direct support of the Purple Level-1 milestone. This evidence included a short presentation and the formal documentation of milestone No.461 (see UCRL-TR-217288). Following the meeting, the Committee asked for the following additional evidence: (1) Set a speed measurement/goal/target assuming a number of files that the user needs to get into the archives. Then redo the benchmark using whatever tool(s) the labs prefer (HTAR, for example). Documentmore » how long the process takes. (2) Develop a test to read files back to confirm that what the user gets out of the archive is what the user put into the archive. This evidence has been collected and is presented here.« less
  • This report documents the satisfaction of the completion criteria associated with ASC FY08 Milestone ID No.2933: 'Deploy Moab resource management services on BlueGene/L'. Specifically, this milestone represents LLNL efforts to enhance both SLURM and Moab to extend Moab's capabilities to schedule and manage BlueGene/L, and increases portability of user scripts between ASC systems. The completion criteria for the milestone are the following: (1) Batch jobs can be specified, submitted to Moab, scheduled and run on the BlueGene/L system; (2) Moab will be able to support the markedly increased scale in node count as well as the wiring geometry that ismore » unique to BlueGene/L; and (3) Moab will also prepare and report statistics of job CPU usage just as it does for the current systems it supports. This document presents the completion evidence for both of the stated milestone certification methods: Completion evidence for this milestone will be in the form of (1) documentation--a report that certifies that the completion criteria have been met; and (2) user hand-off. As the selected Tri-Lab workload manager, Moab was chosen to replace LCRM as the enterprise-wide scheduler across Livermore Computing (LC) systems. While LCRM/SLURM successfully scheduled jobs on BG/L, the effort to replace LCRM with Moab on BG/L represented a significant challenge. Moab is a commercial product developed and sold by Cluster Resources, Inc. (CRI). Moab receives the users batch job requests and dispatches these jobs to run on a specific cluster. SLURM is an open-source resource manager whose development is managed by members of the Integrated Computational Resource Management Group (ICRMG) within the Services and Development Division at LLNL. SLURM is responsible for launching and running jobs on an individual cluster. Replacing LCRM with Moab on BG/L required substantial changes to both Moab and SLURM. While the ICRMG could directly manage the SLURM development effort, the work to enhance Moab had to be done by Moab's vendor. Members of the ICRMG held many meetings with CRI developers to develop the design and specify the requirements for what Moab needed to do. Extensions to SLURM are used to run jobs on the BlueGene/L architecture. These extensions support the three dimensional network topology unique to BG/L. While BG/L geometry support was already in SLURM, enhancements were needed to provide backfill capability and answer 'will-run' queries from Moab. For its part, the Moab architecture needed to be modified to interact with SLURM in a more coordinated way. It needed enhancements to support SLURM's shorthand notation for representing thousands of compute nodes and report this information using Moab's existing status commands. The LCRM wrapper scripts that emulated LCRM commands also needed to be enhanced to support BG/L usage. The effort was successful as Moab 5.2.2 and SLURM 1.3 was installed on the 106496 node BG/L machine on May 21, 2008, and turned over to the users to run production.« less