skip to main content

Title: Continuous Security and Configuration Monitoring of HPC Clusters

Continuous security and configuration monitoring of information systems has been a time consuming and laborious task for system administrators at the High Performance Computing (HPC) center. Prior to this project, system administrators had to manually check the settings of thousands of nodes, which required a significant number of hours rendering the old process ineffective and inefficient. This paper explains the application of Splunk Enterprise, a software agent, and a reporting tool in the development of a user application interface to track and report on critical system updates and security compliance status of HPC Clusters. In conjunction with other configuration management systems, the reporting tool is to provide continuous situational awareness to system administrators of the compliance state of information systems. Our approach consisted of the development, testing, and deployment of an agent to collect any arbitrary information across a massively distributed computing center, and organize that information into a human-readable format. Using Splunk Enterprise, this raw data was then gathered into a central repository and indexed for search, analysis, and correlation. Following acquisition and accumulation, the reporting tool generated and presented actionable information by filtering the data according to command line parameters passed at run time. Preliminary data showed resultsmore » for over six thousand nodes. Further research and expansion of this tool could lead to the development of a series of agents to gather and report critical system parameters. However, in order to make use of the flexibility and resourcefulness of the reporting tool the agent must conform to specifications set forth in this paper. This project has simplified the way system administrators gather, analyze, and report on the configuration and security state of HPC clusters, maintaining ongoing situational awareness. Rather than querying each cluster independently, compliance checking can be managed from one central location.« less
 [1] ;  [1] ;  [1]
  1. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Publication Date:
OSTI Identifier:
Report Number(s):
DOE Contract Number:
Resource Type:
Technical Report
Research Org:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Org:
Country of Publication:
United States