Case Study: Setting up and Running a Production Linux Cluster at Pacific Northwest National Laboratory
With the low price and increasing performance of commodity computer hardware, it is important to study the viability of using clusters of relatively inexpensive computers to produce a stable system, capable of the current demands for high performance massively parallel computing. A 192-processor cluster was installed to test and develop methods that would make the PC cluster a workable alternative to using other commercial systems for use in scientific research. By comparing PC clusters with the cluster systems sold commercially, it became apparent that the tools to manage the PC cluster as a single system were not as robust or as well integrated as in many commercial systems. This paper is focused on the problems encountered and solutions used to stabilize this cluster for both production and development use. This included the use of extra hardware such as remote power control units and multi-port adapters to provide remote access to both the system console and system power. A Giganet cLAN fabric was also used to provide a high-speed, low-latency interconnect. Software solutions were used for resource management, job scheduling and accounting, parallel filesystems, remote network installation and system monitoring. Although there are still some tools missing for debugging hardware problems, the PC cluster continues to be very stable and useful for users.
- Research Organization:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 965174
- Report Number(s):
- PNNL-SA-34727; TRN: US200920%%263
- Resource Relation:
- Conference: Linux Clusters: the HPC Revolution; 2nd LCI Conference June 25-27, 2001
- Country of Publication:
- United States
- Language:
- English
Similar Records
A visualization tool for parallel and distributed computing using the Lilith framework
Final Scientific Report: A Scalable Development Environment for Peta-Scale Computing