skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Case Study: Setting up and Running a Production Linux Cluster at Pacific Northwest National Laboratory

Conference ·
OSTI ID:965174

With the low price and increasing performance of commodity computer hardware, it is important to study the viability of using clusters of relatively inexpensive computers to produce a stable system, capable of the current demands for high performance massively parallel computing. A 192-processor cluster was installed to test and develop methods that would make the PC cluster a workable alternative to using other commercial systems for use in scientific research. By comparing PC clusters with the cluster systems sold commercially, it became apparent that the tools to manage the PC cluster as a single system were not as robust or as well integrated as in many commercial systems. This paper is focused on the problems encountered and solutions used to stabilize this cluster for both production and development use. This included the use of extra hardware such as remote power control units and multi-port adapters to provide remote access to both the system console and system power. A Giganet cLAN fabric was also used to provide a high-speed, low-latency interconnect. Software solutions were used for resource management, job scheduling and accounting, parallel filesystems, remote network installation and system monitoring. Although there are still some tools missing for debugging hardware problems, the PC cluster continues to be very stable and useful for users.

Research Organization:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
965174
Report Number(s):
PNNL-SA-34727; TRN: US200920%%263
Resource Relation:
Conference: Linux Clusters: the HPC Revolution; 2nd LCI Conference June 25-27, 2001
Country of Publication:
United States
Language:
English