Summary: LOW-COST HARDWARE FAULT DETECTION AND DIAGNOSIS FOR
MULTICORE SYSTEMS RUNNING MULTITHREADED WORKLOADS
SIVA KUMAR SASTRY HARI
B.Tech., Indian Institute of Technology, Madras, 2007
Submitted in partial fulfillment of the requirements
for the degree of Master of Science in Computer Science
in the Graduate College of the
University of Illinois at Urbana-Champaign, 2009
Professor Sarita V. Adve
Continued device scaling is resulting in smaller devices that are increasingly vulnerable to errors
from various sources, e.g., wear-out and high energy particle strikes. As this reliability threat grows,
future shipped hardware will likely fail due to in-the-field hardware faults. A comprehensive relia-
bility solution should detect the fault, diagnose the source of it, and recover the correct execution.
Traditional redundancy-based reliability solutions that handle these faults are too expensive for main-
stream computing. A promising approach is using software-level symptoms to detect hardware faults.