Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Investigating Anomalies in Compute Clusters: An Unsupervised Learning Approach

Conference ·
DOI:https://doi.org/10.2172/2439937· OSTI ID:2439937
 [1];  [1];  [2];  [2];  [2];  [2];  [2];  [2];  [2];  [2];  [1]
  1. College of William and Mary, Williamsburg, VA (United States)
  2. Thomas Jefferson National Accelerator Facility (TJNAF), Newport News, VA (United States)
As compute clusters continue to grow in scale and complexity, the frequency of detected anomalies in their operation significantly increases. Timely detection of anomalous events is vital to maintain system efficiency and availability. This study presents an attentionbased graph neural network (GNN) for detecting anomalies in clusters at the compute node level and for providing detailed root cause analysis. We show the effectiveness of attention-based GNNs to accurately detect and localize anomalies on real-world datasets.
Research Organization:
Thomas Jefferson National Accelerator Facility, Newport News, VA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Nuclear Physics (NP)
DOE Contract Number:
AC05-06OR23177
OSTI ID:
2439937
Report Number(s):
JLAB-CST-23-4174; DOE/OR/23177-7647
Country of Publication:
United States
Language:
English

Similar Records

Decode the Workload: Training Deep Learning Models for Efficient Compute Cluster Representation
Conference · Tue Oct 01 00:00:00 EDT 2024 · OSTI ID:2489884

Enhancing Network Anomaly Detection Using Graph Neural Networks
Conference · Tue Jun 11 00:00:00 EDT 2024 · 2024 22nd Mediterranean Communication and Computer Networking Conference (MedComNet) · OSTI ID:2426922

Enhancing Network Anomaly Detection Using Graph Neural Networks
Conference · Tue Jun 11 00:00:00 EDT 2024 · OSTI ID:3003233

Related Subjects