skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Continuous Security and Configuration Monitoring of HPC Clusters

Technical Report ·
DOI:https://doi.org/10.2172/1184182· OSTI ID:1184182
 [1];  [1];  [1]
  1. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

Continuous security and configuration monitoring of information systems has been a time consuming and laborious task for system administrators at the High Performance Computing (HPC) center. Prior to this project, system administrators had to manually check the settings of thousands of nodes, which required a significant number of hours rendering the old process ineffective and inefficient. This paper explains the application of Splunk Enterprise, a software agent, and a reporting tool in the development of a user application interface to track and report on critical system updates and security compliance status of HPC Clusters. In conjunction with other configuration management systems, the reporting tool is to provide continuous situational awareness to system administrators of the compliance state of information systems. Our approach consisted of the development, testing, and deployment of an agent to collect any arbitrary information across a massively distributed computing center, and organize that information into a human-readable format. Using Splunk Enterprise, this raw data was then gathered into a central repository and indexed for search, analysis, and correlation. Following acquisition and accumulation, the reporting tool generated and presented actionable information by filtering the data according to command line parameters passed at run time. Preliminary data showed results for over six thousand nodes. Further research and expansion of this tool could lead to the development of a series of agents to gather and report critical system parameters. However, in order to make use of the flexibility and resourcefulness of the reporting tool the agent must conform to specifications set forth in this paper. This project has simplified the way system administrators gather, analyze, and report on the configuration and security state of HPC clusters, maintaining ongoing situational awareness. Rather than querying each cluster independently, compliance checking can be managed from one central location.

Research Organization:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
DE-AC52-07NA27344
OSTI ID:
1184182
Report Number(s):
LLNL-TR-670919
Country of Publication:
United States
Language:
English