skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Software architecture of the light weight kernel, catamount.

Conference ·
OSTI ID:970212

Catamount is designed to be a low overhead operating system for a parallel computing environment. Functionality is limited to the minimum set needed to run a scientific computation. The design choices and implementations will be presented. A massively parallel processor (MPP), high performance computing (HPC) system is particularly sensitive to operating system overhead. Traditional, multi-purpose, operating systems are designed to support a wide range of usage models and requirements. To support the range of needs, a large number of system processes are provided and are often interdependent on each other. The overhead of these processes leads to an unpredictable amount of processor time available to a parallel application. Except in the case of the most embarrassingly parallel of applications, an MPP application must share interim results with its peers before it can make further progress. These synchronization events are made at specific points in the application code. If one processor takes longer to reach that point than all the other processors, everyone must wait. The overall finish time is increased. Sandia National Laboratories began addressing this problem more than a decade ago with an architecture based on node specialization. Sets of nodes in an MPP are designated to perform specific tasks, each running an operating system best suited to the specialized function. Sandia chose to not use a multi-purpose operating system for the computational nodes and instead began developing its first light weight operating system, SUNMOS, which ran on the compute nodes on the Intel Paragon system. Based on its viability, the architecture evolved into the PUMA operating system. Intel ported PUMA to the ASCI Red TFLOPS system, thus creating the Cougar operating system. Most recently, Cougar has been ported to Cray's XT3 system and renamed to Catamount. As the references indicate, there are a number of descriptions of the predecessor operating systems. While the majority of those discussions still apply to Catamount, this paper takes a fresh look at the architecture as it is currently implemented.

Research Organization:
Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC04-94AL85000
OSTI ID:
970212
Report Number(s):
SAND2005-2781C; TRN: US201003%%435
Resource Relation:
Conference: Proposed for presentation at the Cray User Group held May 16-19, 2005 in Albuquerque, NM.
Country of Publication:
United States
Language:
English

Similar Records

Catamount
Software · Mon Mar 01 00:00:00 EST 2004 · OSTI ID:970212

Achieving high performance on the Intel Paragon
Conference · Mon Nov 01 00:00:00 EST 1993 · OSTI ID:970212

Early experiences and performance of the Intel Paragon
Technical Report · Mon Aug 01 00:00:00 EDT 1994 · OSTI ID:970212