Fault Diagnosis of Hybrid Computing Systems Using Chaotic-Map Method
- ORNL
- Los Alamos National Laboratory (LANL)
Computing systems are becoming increasingly complex with nodes consisting of a combination of multi-core central processing units (CPUs), many integrated core (MIC) and graphics processing unit (GPU) accelerators. These computing units and their interconnections are subject to different classes of hardware and software faults, which should be detected to support mitigation measures. We present the chaotic-map method that uses the exponential divergence and wide Fourier properties of the trajectories, combined with memory allocations and assignments to diagnose component-level faults in these hybrid computing systems. We propose lightweight codes that utilize highly parallel chaotic-map computations tailored to isolate faults in arithmetic units, memory elements and interconnects. The diagnosis module on a node utilizes pthreads to place chaotic-map threads on CPU and MIC cores, and CUDA C and OpenCL kernels on GPU blocks. We present experimental diagnosis results on five multi-core CPUs; one MIC; and, seven GPUs with typical diagnosis run-times under a minute.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1561635
- Country of Publication:
- United States
- Language:
- English
Similar Records
Fault Diagnosis of Hybrid Computing Systems Using Chaotic-Map Method
Failure detection in high-performance clusters and computers using chaotic map computations
Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors
Book
·
Thu Nov 01 00:00:00 EDT 2018
·
OSTI ID:1649633
Failure detection in high-performance clusters and computers using chaotic map computations
Patent
·
Mon Aug 31 20:00:00 EDT 2015
·
OSTI ID:1213445
Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors
Conference
·
Thu Dec 31 23:00:00 EST 2009
·
OSTI ID:974630