Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

J Supercomput DOI 10.1007/s11227-011-0670-9

Summary: J Supercomput
DOI 10.1007/s11227-011-0670-9
A fault-tolerant architecture for parallel applications
in tiled-CMPs
Daniel Sánchez · Juan L. Aragón · José M. García
© Springer Science+Business Media, LLC 2011
Abstract Nowadays, hardware reliability is considered a first-class issue along with
performance and energy efficiency. The increasing scaling technology and subse-
quent supply voltage reductions, together with temperature fluctuations, augment the
susceptibility of architectures to errors.
With the development of CMPs, the interest for using parallel applications has
increased. Previous proposals for providing fault detection and recovery have been
mainly based on redundant execution over different cores. RMT (Redundant Multi-
Threading) is a family of techniques based on SMT (Simultaneous Multi-Threading)
processors in which two independent threads (master and slave), fed with the same
inputs, redundantly execute the same instructions, in order to detect faults by check-
ing their outputs. In this paper, we study the under-explored architectural support of
RMT techniques to reliably execute shared-memory applications in tiled-CMPs.
Initially, we show how atomic operations induce serialization points between mas-
ter and slave threads, degrading the execution time by 35% for several parallel sci-


Source: Aragón Alcaraz, Juan Luis - Departamento de Ingenieria y Tecnologia de Computadores, Universidad de Murcia


Collections: Computer Technologies and Information Sciences