Fault Injection for TensorFlow Applications
Journal Article
·
· IEEE Transactions on Dependable and Secure Computing
- University of British Columbia, Vancouver, BC (Canada)
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- University of Iowa, Iowa City, IA (United States)
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
As machine learning (ML) has seen increasing adoption in safety-critical domains (e.g., autonomous vehicles), the reliability of ML systems has also grown in importance. While prior studies have proposed techniques to enable efficient error-resilience (e.g., selective instruction duplication), a fundamental requirement for realizing these techniques is a detailed understanding of the application’s resilience. In this work, we present TensorFI 1 and TensorFI 2, high-level fault injection (FI) frameworks for TensorFlow-based applications. TensorFI 1 and 2 are able to inject both hardware and software faults in any general TensorFlow 1 and 2 program respectively. Both are configurable FI tools that are flexible, easy to use, and portable. They can be integrated into existing TensorFlow programs to assess their resilience for different fault types (e.g., bit-flips in particular operations or layers). We use the TensorFI 1 and TensorFI 2 to evaluate the resilience of 12 and 10 ML programs written in TensorFlow, including DNNs used in the autonomous vehicle domain. The results give us insights into why some of the models are more resilient. We also measure the performance overheads of the two injectors, and present 4 case studies, two for each tool, to demonstrate their utility.
- Research Organization:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA)
- Grant/Contract Number:
- 89233218CNA000001; AC05-76RL01830
- OSTI ID:
- 1994506
- Report Number(s):
- LA-UR-21-22618; PNNL-SA-161122
- Journal Information:
- IEEE Transactions on Dependable and Secure Computing, Journal Name: IEEE Transactions on Dependable and Secure Computing Journal Issue: 4 Vol. 20; ISSN 1545-5971
- Publisher:
- IEEECopyright Statement
- Country of Publication:
- United States
- Language:
- English
Fault injection techniques and tools
|
journal | April 1997 |
Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey
|
journal | January 2018 |
DeepMutation++: A Mutation Testing Framework for Deep Learning Systems
|
conference | November 2019 |
ImageNet: A large-scale hierarchical image database
|
conference | June 2009 |
Cats and dogs
|
conference | June 2012 |
Policy compression for aircraft collision avoidance systems
|
conference | September 2016 |
PyTorchFI: A Runtime Perturbation Tool for DNNs
|
conference | June 2020 |
LFI: A practical and general library-level fault injector
|
conference | June 2009 |
Quantifying the Accuracy of High-Level Fault Injection Techniques for Hardware Faults
|
conference | June 2014 |
Hands Off the Wheel in Autonomous Vehicles?: A Systems Perspective on over a Million Miles of Field Data
|
conference | June 2018 |
Automated robustness testing of off-the-shelf software components
|
conference | January 1998 |
Evaluating Fault Resiliency of Compressed Deep Neural Networks
|
conference | June 2019 |
Detection of traffic signs in real-world images: The German traffic sign detection benchmark
|
conference | August 2013 |
NFTAPE: a framework for assessing dependability in distributed systems with lightweight fault injectors
|
conference | January 2000 |
DeepMutation: Mutation Testing of Deep Learning Systems
|
conference | October 2018 |
TensorFI: A Flexible Fault Injection Framework for TensorFlow Applications
|
conference | October 2020 |
LLFI: An Intermediate Code-Level Fault Injection Tool for Hardware Faults
|
conference | August 2015 |
An empirical study of injected versus actual interface errors
|
conference | July 2014 |
Understanding error propagation in deep learning neural network (DNN) accelerators and applications
|
conference | November 2017 |
DeepXplore
|
conference | October 2017 |
Ares
|
conference | June 2018 |
BinFI
|
conference | November 2019 |
Audee
|
conference | December 2020 |
Deep learning library testing via effective model generation
|
conference | November 2020 |
Similar Records
Supporting the Development of Soft-Error Resilient Message Passing Applications using Simulation
SpotSDC: Revealing the Silent Data Corruption Propagation in High-Performance Computing Systems
Exploring the Interplay of Resilience and Energy Consumption for a Task-Based Partial Differential Equations Preconditioner
Conference
·
Thu Dec 31 23:00:00 EST 2015
·
OSTI ID:1241477
SpotSDC: Revealing the Silent Data Corruption Propagation in High-Performance Computing Systems
Journal Article
·
Thu May 14 20:00:00 EDT 2020
· IEEE Transactions on Visualization and Computer Graphics
·
OSTI ID:1868154
Exploring the Interplay of Resilience and Energy Consumption for a Task-Based Partial Differential Equations Preconditioner
Technical Report
·
Mon Feb 29 23:00:00 EST 2016
·
OSTI ID:1561016