DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Failure detection in high-performance clusters and computers using chaotic map computations

Abstract

A programmable media includes a processing unit capable of independent operation in a machine that is capable of executing 10.sup.18 floating point operations per second. The processing unit is in communication with a memory element and an interconnect that couples computing nodes. The programmable media includes a logical unit configured to execute arithmetic functions, comparative functions, and/or logical functions. The processing unit is configured to detect computing component failures, memory element failures and/or interconnect failures by executing programming threads that generate one or more chaotic map trajectories. The central processing unit or graphical processing unit is configured to detect a computing component failure, memory element failure and/or an interconnect failure through an automated comparison of signal trajectories generated by the chaotic maps.

Inventors:
Issue Date:
Research Org.:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1213445
Patent Number(s):
9122603
Application Number:
13/919,601
Assignee:
UT-Battelle, LLC (Oak Ridge, TN)
Patent Classifications (CPCs):
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Patent
Resource Relation:
Patent File Date: 2013 Jun 17
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Rao, Nageswara S. Failure detection in high-performance clusters and computers using chaotic map computations. United States: N. p., 2015. Web.
Rao, Nageswara S. Failure detection in high-performance clusters and computers using chaotic map computations. United States.
Rao, Nageswara S. Tue . "Failure detection in high-performance clusters and computers using chaotic map computations". United States. https://www.osti.gov/servlets/purl/1213445.
@article{osti_1213445,
title = {Failure detection in high-performance clusters and computers using chaotic map computations},
author = {Rao, Nageswara S.},
abstractNote = {A programmable media includes a processing unit capable of independent operation in a machine that is capable of executing 10.sup.18 floating point operations per second. The processing unit is in communication with a memory element and an interconnect that couples computing nodes. The programmable media includes a logical unit configured to execute arithmetic functions, comparative functions, and/or logical functions. The processing unit is configured to detect computing component failures, memory element failures and/or interconnect failures by executing programming threads that generate one or more chaotic map trajectories. The central processing unit or graphical processing unit is configured to detect a computing component failure, memory element failure and/or an interconnect failure through an automated comparison of signal trajectories generated by the chaotic maps.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Sep 01 00:00:00 EDT 2015},
month = {Tue Sep 01 00:00:00 EDT 2015}
}

Works referenced in this record:

Integrated control and diagnostics system
patent, November 2007


Chaos: An Introduction to Dynamical Systems
journal, November 1997


Basic concepts and taxonomy of dependable and secure computing
journal, January 2004


Designing programs that check their work
journal, January 1995


Toward Exascale Resilience
journal, September 2009


The International Exascale Software Project roadmap
journal, January 2011


Quasiperiodic Route to Chaotic Dynamics of Internet Transport Protocols
journal, May 2005


Chaos: A tutorial for engineers
journal, January 1987


Computational complexity issues in operative diagnosis of graph-based systems
journal, April 1993


On Dynamics of Transport Protocols Over Wide-Area Internet Connections
book, January 2005


On polynomial-time testable combinational circuits
journal, January 1994


Fail-stop processors: an approach to designing fault-tolerant computing systems
journal, August 1983