System and methods for hardware-software cooperative pipeline error detection
Abstract
An error reporting system utilizes a parity checker to receive data results from execution of an original instruction and a parity bit for the data. A decoder receives an error correcting code (ECC) for data resulting from execution of a shadow instruction of the original instruction, and data error correction is initiated on the original instruction result on condition of a mismatch between the parity bit and the original instruction result, and the decoder asserting a correctable error in the original instruction result.
- Inventors:
- Issue Date:
- Research Org.:
- Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1924970
- Patent Number(s):
- 11409597
- Application Number:
- 16/811,499
- Assignee:
- Nvidia Corp. (Santa Clara, CA)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- B620719
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 03/06/2020
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Sullivan, Michael, Hari, Siva, Zimmer, Brian, Tsai, Timothy, and Keckler, Stephen W. System and methods for hardware-software cooperative pipeline error detection. United States: N. p., 2022.
Web.
Sullivan, Michael, Hari, Siva, Zimmer, Brian, Tsai, Timothy, & Keckler, Stephen W. System and methods for hardware-software cooperative pipeline error detection. United States.
Sullivan, Michael, Hari, Siva, Zimmer, Brian, Tsai, Timothy, and Keckler, Stephen W. Tue .
"System and methods for hardware-software cooperative pipeline error detection". United States. https://www.osti.gov/servlets/purl/1924970.
@article{osti_1924970,
title = {System and methods for hardware-software cooperative pipeline error detection},
author = {Sullivan, Michael and Hari, Siva and Zimmer, Brian and Tsai, Timothy and Keckler, Stephen W.},
abstractNote = {An error reporting system utilizes a parity checker to receive data results from execution of an original instruction and a parity bit for the data. A decoder receives an error correcting code (ECC) for data resulting from execution of a shadow instruction of the original instruction, and data error correction is initiated on the original instruction result on condition of a mismatch between the parity bit and the original instruction result, and the decoder asserting a correctable error in the original instruction result.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2022},
month = {8}
}
Works referenced in this record:
Error Corrections Coding Over Multiple Memory Pages
patent-application, October 2013
- Anholt, Micha; Ordentlich, Or; Sommer, Naftali
- US Patent Application 13/921,446; 2013/0283122 Al
Integrated circuit device and method for reducing SRAM leakage
patent-application, February 2017
- Engin, Nur; Kapoor, Ajay
- US Patent Application 14/820417; 20170039103
Arithmetic Error Codes: Cost and Effectiveness Studies for Application in Digital System Design
journal, November 1971
- Avizienis, A.
- IEEE Transactions on Computers, Vol. C-20, Issue 11
Method and apparatus for providing error correction within a register file of a CPU
patent, June 2006
- Tremblay, Marc; Chaudhry, Shailender; Jacobson, Quinn A.
- US Patent Document 7,058,877
Processor register error correction management
patent, December 2016
- Bose, Pradip; Cher, Chen-Yong; Gupta, Meeta S.
- US Patent Document 9,529,653
Tolerating soft errors by selective duplication
patent-application, December 2011
- Elnozahy, Elmootazbellah Nabil; Stephenson, Mark William
- US Patent Application 12/788968; 20110296228
Method and apparatus for providing error correction within a register file of a CPU
patent-application, November 2003
- Tremblay, Marc; Chaudhry, Shailender; Jacobson, Quinn A.
- US Patent Application 10/146100; 20030217325
Conditional branch execution in a processor having a write-tie instruction and a data mover engine that associates register addresses with memory addresses
patent, May 2010
- Thekkath, Radhika; Kishore, Karagada Ramarao; Rajagopalan, Vidya
- US Patent Document 7,721,075
System and apparatus for error-correcting register files
patent, October 2012
- Bybell, Anthony J.; Mitchell, Michael B.; Sullivan, Jason M.
- US Patent Document 8,301,992
System and method for testing a logic-based processing device
patent-application, December 2015
- Lin, Hai; Mitra, Subhasish
- US Patent Application 14/318976; 20150377961
System and apparatus for error-correcting register files
patent-application, February 2011
- Bybell, Anthony J.; Mitchell, Michael B.; Sullivan, Jason M.
- US Patent Application 12/537890; 20110035643
Semiconductor Memory Devices and Memory Systems Including the Same
patent-application, April 2017
- Cha, Sang-Uhn; Chung, Hoi-Ju
- US Patent Application 15/204,536; 2017/0109231 Al
Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection
conference, March 2007
- Wang, Cheng; Kim, Ho-seop; Wu, Youfeng
- International Symposium on Code Generation and Optimization (CGO'07)
Error detection by duplicated instructions in super-scalar processors
journal, March 2002
- Oh, N.; Shirvani, P. P.; McCluskey, E. J.
- IEEE Transactions on Reliability, Vol. 51, Issue 1
Error correction for flash memory
patent-application, May 2010
- Nazarian, Hagop; Hou, Ping
- US Patent Application 12/267017; 20100122146
Error correction code memory system with a small footprint and byte write operation
patent, November 2009
- Nelson, Michael D.; Fallside, Hamish T.
- US Patent Document 7,620,875
Error correction method with instruction level rollback
patent, January 2012
- Hirotsu, Teppei; Yamada, Hiromichi; Sakata, Teruaki
- US Patent Document 8,095,825
Fault Free Store Data Path for Software Implementation of Redundant Multithreading Environments
patent-application, July 2006
- Mukherjee, Shubhendu S.; Cohn, Robert
- US Patent Application 11/022,600; 2006/0156123 Al
Real-world design and evaluation of compiler-managed GPU redundant multithreading
conference, June 2014
- Wadden, Jack; Lyashevsky, Alexander; Gurumurthi, Sudhanva
- 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)
SwapCodes: Error Codes for Hardware-Software Cooperative GPU Pipeline Error Detection
conference, October 2018
- Sullivan, Michael B.; Hari, Siva Kumar Sastry; Zimmer, Brian
- 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
Error detection method and system for processors that employ alternating threads
patent-application, June 2005
- Safford, Kevin D.; Soltis, Jr., Donald C.; Undy, Stephen R.
- US Patent Application 10/714258; 20050138478
Understanding software approaches for GPGPU reliability
conference, March 2009
- Dimitrov, Martin; Mantor, Mike; Zhou, Huiyang
- Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Instruction and logic for support of code modification
patent-application, October 2015
- Kelm, John H.; Keppel, David P.; Mackintosh, David N.
- US Patent Application 14/229161; 20150277915