Investigating Anomalies in Compute Clusters: An Unsupervised Learning Approach
- College of William and Mary, Williamsburg, VA (United States)
- Thomas Jefferson National Accelerator Facility (TJNAF), Newport News, VA (United States)
As compute clusters continue to grow in scale and complexity, the frequency of detected anomalies in their operation significantly increases. Timely detection of anomalous events is vital to maintain system efficiency and availability. This study presents an attentionbased graph neural network (GNN) for detecting anomalies in clusters at the compute node level and for providing detailed root cause analysis. We show the effectiveness of attention-based GNNs to accurately detect and localize anomalies on real-world datasets.
- Research Organization:
- Thomas Jefferson National Accelerator Facility, Newport News, VA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Nuclear Physics (NP)
- DOE Contract Number:
- AC05-06OR23177
- OSTI ID:
- 2439937
- Report Number(s):
- JLAB-CST-23-4174; DOE/OR/23177-7647
- Country of Publication:
- United States
- Language:
- English
Similar Records
Decode the Workload: Training Deep Learning Models for Efficient Compute Cluster Representation
Enhancing Network Anomaly Detection Using Graph Neural Networks
Enhancing Network Anomaly Detection Using Graph Neural Networks
Conference
·
Tue Oct 01 00:00:00 EDT 2024
·
OSTI ID:2489884
Enhancing Network Anomaly Detection Using Graph Neural Networks
Conference
·
Tue Jun 11 00:00:00 EDT 2024
· 2024 22nd Mediterranean Communication and Computer Networking Conference (MedComNet)
·
OSTI ID:2426922
Enhancing Network Anomaly Detection Using Graph Neural Networks
Conference
·
Tue Jun 11 00:00:00 EDT 2024
·
OSTI ID:3003233