Scalable cluster administration - Chiba City I approach and lessons learned.
Systems administrators of large clusters often need to perform the same administrative activity hundreds or thousands of times. Often such activities are time-consuming, especially the tasks of installing and maintaining software. By combining network services such as DHCP, TFTP, FTP, HTTP, and NFS with remote hardware control, cluster administrators can automate all administrative tasks. Scalable cluster administration addresses the following challenge: What systems design techniques can cluster builders use to automate cluster administration on very large clusters? We describe the approach used in the Mathematics and Computer Science Division of Argonne National Laboratory on Chiba City I, a 314-node Linux cluster; and we analyze the scalability, flexibility, and reliability benefits and limitations from that approach.
- Research Organization:
- Argonne National Lab., IL (US)
- Sponsoring Organization:
- US Department of Energy (US)
- DOE Contract Number:
- W-31-109-ENG-38
- OSTI ID:
- 801648
- Report Number(s):
- ANL/MCS/CP-108094; TRN: US200223%%198
- Resource Relation:
- Conference: Cluster 2002, IEEE International Conference on Cluster Computing, Chicago, IL (US), 09/24/2002--09/26/2002; Other Information: PBD: 1 Jul 2002
- Country of Publication:
- United States
- Language:
- English
Similar Records
Ten million and one penguins, or, lessons learned from booting millions of virtual machines on HPC systems.
STAR Data Production Workflow on HPC: Lessons Learned & Best Practices