Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Real-Time System Log Monitoring/Analytics Framework

Conference ·
OSTI ID:1056901

Analyzing system logs provides useful insights for identifying system/application anomalies and helps in better usage of system resources. Nevertheless, it is simply not practical to scan through the raw log messages on a regular basis for large-scale systems. First, the sheer volume of unstructured log messages affects the readability, and secondly correlating the log messages to system events is a daunting task. These factors limit large-scale system logs primarily for generating alerts on known system events, and post-mortem diagnosis for identifying previously unknown system events that impacted the systems performance. In this paper, we describe a log monitoring framework that enables prompt analysis of system events in real-time. Our web-based framework provides a summarized view of console, netwatch, consumer, and apsched logs in real- time. The logs are parsed and processed to generate views of applications, message types, individual/group of compute nodes, and in sections of the compute platform. Also from past application runs we build a statistical profile of user/application characteristics with respect to known system events, recoverable/non-recoverable error messages and resources utilized. The web-based tool is being developed for Jaguar XT5 at the Oak Ridge Leadership Computing facility.

Research Organization:
Oak Ridge National Laboratory (ORNL); Center for Computational Sciences
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1056901
Country of Publication:
United States
Language:
English

Similar Records

Correlating Log Messages for System Diagnostics
Conference · Thu Dec 31 23:00:00 EST 2009 · OSTI ID:982738

High Performance Computing Facility Operational Assessment, FY 2010 Oak Ridge Leadership Computing Facility
Technical Report · Sun Aug 01 00:00:00 EDT 2010 · OSTI ID:985781

Use of the ERD for administrative monitoring of Theta
Conference · Sun Aug 25 00:00:00 EDT 2019 · Currency and Computation (Online) · OSTI ID:1559857

Related Subjects