skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Tri-Laboratory Linux Capacity Cluster 2007 SOW

Abstract

The Advanced Simulation and Computing (ASC) Program (formerly know as Accelerated Strategic Computing Initiative, ASCI) has led the world in capability computing for the last ten years. Capability computing is defined as a world-class platform (in the Top10 of the Top500.org list) with scientific simulations running at scale on the platform. Example systems are ASCI Red, Blue-Pacific, Blue-Mountain, White, Q, RedStorm, and Purple. ASC applications have scaled to multiple thousands of CPUs and accomplished a long list of mission milestones on these ASC capability platforms. However, the computing demands of the ASC and Stockpile Stewardship programs also include a vast number of smaller scale runs for day-to-day simulations. Indeed, every 'hero' capability run requires many hundreds to thousands of much smaller runs in preparation and post processing activities. In addition, there are many aspects of the Stockpile Stewardship Program (SSP) that can be directly accomplished with these so-called 'capacity' calculations. The need for capacity is now so great within the program that it is increasingly difficult to allocate the computer resources required by the larger capability runs. To rectify the current 'capacity' computing resource shortfall, the ASC program has allocated a large portion of the overall ASC platforms budget tomore » 'capacity' systems. In addition, within the next five to ten years the Life Extension Programs (LEPs) for major nuclear weapons systems must be accomplished. These LEPs and other SSP programmatic elements will further drive the need for capacity calculations and hence 'capacity' systems as well as future ASC capability calculations on 'capability' systems. To respond to this new workload analysis, the ASC program will be making a large sustained strategic investment in these capacity systems over the next ten years, starting with the United States Government Fiscal Year 2007 (GFY07). However, given the growing need for 'capability' systems as well, the budget demands are extreme and new, more cost effective ways of fielding these systems must be developed. This Tri-Laboratory Linux Capacity Cluster (TLCC) procurement represents the ASC first investment vehicle in these capacity systems. It also represents a new strategy for quickly building, fielding and integrating many Linux clusters of various sizes into classified and unclassified production service through a concept of Scalable Units (SU). The programmatic objective is to dramatically reduce the overall Total Cost of Ownership (TCO) of these 'capacity' systems relative to the best practices in Linux Cluster deployments today. This objective only makes sense in the context of these systems quickly becoming very robust and useful production clusters under the crushing load that will be inflicted on them by the ASC and SSP scientific simulation capacity workload.« less

Authors:
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1036298
Report Number(s):
UCRL-SR-229520
TRN: US1201387
DOE Contract Number:  
W-7405-ENG-48
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICAL METHODS AND COMPUTING; 98 NUCLEAR DISARMAMENT, SAFEGUARDS, AND PHYSICAL PROTECTION; CAPACITY; COMPUTERS; NUCLEAR WEAPONS; PROCESSING; PROCUREMENT; PRODUCTION; SIMULATION; STOCKPILES

Citation Formats

Seager, M. Tri-Laboratory Linux Capacity Cluster 2007 SOW. United States: N. p., 2007. Web. doi:10.2172/1036298.
Seager, M. Tri-Laboratory Linux Capacity Cluster 2007 SOW. United States. doi:10.2172/1036298.
Seager, M. Thu . "Tri-Laboratory Linux Capacity Cluster 2007 SOW". United States. doi:10.2172/1036298. https://www.osti.gov/servlets/purl/1036298.
@article{osti_1036298,
title = {Tri-Laboratory Linux Capacity Cluster 2007 SOW},
author = {Seager, M},
abstractNote = {The Advanced Simulation and Computing (ASC) Program (formerly know as Accelerated Strategic Computing Initiative, ASCI) has led the world in capability computing for the last ten years. Capability computing is defined as a world-class platform (in the Top10 of the Top500.org list) with scientific simulations running at scale on the platform. Example systems are ASCI Red, Blue-Pacific, Blue-Mountain, White, Q, RedStorm, and Purple. ASC applications have scaled to multiple thousands of CPUs and accomplished a long list of mission milestones on these ASC capability platforms. However, the computing demands of the ASC and Stockpile Stewardship programs also include a vast number of smaller scale runs for day-to-day simulations. Indeed, every 'hero' capability run requires many hundreds to thousands of much smaller runs in preparation and post processing activities. In addition, there are many aspects of the Stockpile Stewardship Program (SSP) that can be directly accomplished with these so-called 'capacity' calculations. The need for capacity is now so great within the program that it is increasingly difficult to allocate the computer resources required by the larger capability runs. To rectify the current 'capacity' computing resource shortfall, the ASC program has allocated a large portion of the overall ASC platforms budget to 'capacity' systems. In addition, within the next five to ten years the Life Extension Programs (LEPs) for major nuclear weapons systems must be accomplished. These LEPs and other SSP programmatic elements will further drive the need for capacity calculations and hence 'capacity' systems as well as future ASC capability calculations on 'capability' systems. To respond to this new workload analysis, the ASC program will be making a large sustained strategic investment in these capacity systems over the next ten years, starting with the United States Government Fiscal Year 2007 (GFY07). However, given the growing need for 'capability' systems as well, the budget demands are extreme and new, more cost effective ways of fielding these systems must be developed. This Tri-Laboratory Linux Capacity Cluster (TLCC) procurement represents the ASC first investment vehicle in these capacity systems. It also represents a new strategy for quickly building, fielding and integrating many Linux clusters of various sizes into classified and unclassified production service through a concept of Scalable Units (SU). The programmatic objective is to dramatically reduce the overall Total Cost of Ownership (TCO) of these 'capacity' systems relative to the best practices in Linux Cluster deployments today. This objective only makes sense in the context of these systems quickly becoming very robust and useful production clusters under the crushing load that will be inflicted on them by the ASC and SSP scientific simulation capacity workload.},
doi = {10.2172/1036298},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Mar 22 00:00:00 EDT 2007},
month = {Thu Mar 22 00:00:00 EDT 2007}
}

Technical Report:

Save / Share: