Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

GUIDE: A Scalable Information Directory Service to Collect, Federate, and Analyze Logs for Operational Insights into a Leadership HPC Facility

Conference ·
In this paper, we describe the GUIDE framework to collect, federate, and analyze log data from the Oak Ridge Leadership Computing Facility's (OLCF), and how we use it derive insights into center operations. We collectsystem logs and extract monitoring data at every level of the various OLCF subsystems, and have developed a suite of pre-processing tools to make the raw data consumable. The cleansed logs are then ingested and federated into a central, scalable data warehouse, Splunk, that offers storage, indexing, querying, and visualization capabilities. We have further developed and deployed a set of analytics tools to analyze these multiple disparate log streams in concert, and derive operational insights. We describe our experience from developing and deploying the GUIDE infrastructure, and deriving valuable insights on the various subsystems, based on two years of operations in the production OLCF environment.
Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1400206
Country of Publication:
United States
Language:
English

Similar Records

GUIDE: a scalable information directory service to collect, federate, and analyze logs for operational insights into a leadership HPC facility
Conference · Sat Dec 31 23:00:00 EST 2016 · OSTI ID:1567468

Bridging the gaps : joining information sources with Splunk.
Conference · Fri Oct 01 00:00:00 EDT 2010 · OSTI ID:1028434

Bridging the gaps : joining information sources with Splunk.
Conference · Thu Jul 01 00:00:00 EDT 2010 · OSTI ID:1021695

Related Subjects