| | |
Summary: STORM: Lightning-Fast Resource Management
Eitan Frachtenberg Fabrizio Petrini Juan Fernandez Scott Pakin
Salvador Coll
CCS-3 Modeling, Algorithms, and Informatics Group
Computer and Computational Sciences (CCS) Division
Los Alamos National Laboratory
{eitanf,fabrizio,juanf,pakin,scoll}@lanl.gov
July 26, 2002
Abstract
Although workstation clusters are a common platform for high-performance computing
(HPC), they remain more difficult to manage than sequential systems or even symmetric mul-
tiprocessors. Furthermore, as cluster sizes increase, the quality of the resource-management
subsystem--essentially, all of the code that runs on a cluster other than the applications--
increasingly impacts application efficiency. In this paper, we present STORM, a resource-
management framework designed for scalability and performance. The key innovation behind
STORM is a software architecture that enables resource management to exploit low-level network
features. As a result of this HPC-application-like design, STORM is orders of magnitude faster
than the best reported results in the literature on two sample resource-management functions:
job launching and process scheduling.
1 Introduction
|