skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Management of Virtual Large-scale High-performance Computing Systems

Conference ·
OSTI ID:1024319

Linux is widely used on high-performance computing (HPC) systems, from commodity clusters to Cray su- percomputers (which run the Cray Linux Environment). These platforms primarily differ in their system config- uration: some only use SSH to access compute nodes, whereas others employ full resource management sys- tems (e.g., Torque and ALPS on Cray XT systems). Furthermore, latest improvements in system-level virtualization techniques, such as hardware support, virtual machine migration for system resilience purposes, and reduction of virtualization overheads, enables the usage of virtual machines on HPC platforms. Currently, tools for the management of virtual machines in the context of HPC systems are still quite basic, and often tightly coupled to the target platform. In this docu- ment, we present a new system tool for the management of virtual machines in the context of large-scale HPC systems, including a run-time system and the support for all major virtualization solutions. The proposed solution is based on two key aspects. First, Virtual System Envi- ronments (VSE), introduced in a previous study, provide a flexible method to define the software environment that will be used within virtual machines. Secondly, we propose a new system run-time for the management and deployment of VSEs on HPC systems, which supports a wide range of system configurations. For instance, this generic run-time can interact with resource managers such as Torque for the management of virtual machines. Finally, the proposed solution provides appropriate ab- stractions to enable use with a variety of virtualization solutions on different Linux HPC platforms, to include Xen, KVM and the HPC oriented Palacios.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
DOE Contract Number:
DE-AC05-00OR22725
OSTI ID:
1024319
Resource Relation:
Conference: Ottawa Linux Symposium, Ottawa, Canada, 20110613, 20110615
Country of Publication:
United States
Language:
English