skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Micro rollback on a VLSI RISC

Book ·
OSTI ID:5019545

In order to achieve high reliability, computing systems that perform critical tasks must be able to continue normal operation despite component failure, i.e., they must be fault-tolerant. Some methods of achieving fault- tolerance entail a long error recovery time or add considerably to the cycle time because operation cannot proceed until data has been verified. Micro rollback provides rapid restoration of previous system state based on fine- grained checkpointing done in hardware. Operation continues without delay while the data is checked, and if an error is detected a few cycles later, then the system can be rolled back to an error-free state. This paper describes the design and implementation of the UCLA Mirror Processor, a VLSI RISC processor capable of micro rollback. Its main mode of error detection is duplication and comparison. Two processors, a master and a slave, run in lockstep and perform the same operations. The slave processor compares its external signals and a signature of its internal signals with the corresponding signals from the master processor. If an error is detected, the processor state is restored to the beginning of the cycle during which the error occurred, so that correct processor state may be regenerated. Errors detected in the register file are corrected by transferring data from the fault-free processor to the one with the corrupt values. The Mirror Processor architecture, its operation, and its error detection and error recovery features are described, with an emphasis on the physical implementation of the datapath.

OSTI ID:
5019545
Country of Publication:
United States
Language:
English