Exploring the feasibility of lossy compression for PDE simulations
- Holcombe Department of Electrical and Computer Engineering, Clemson University, Clemson, SC, USA
- Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, USA
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Checkpoint restart plays an important role in high-performance computing (HPC) applications, allowing simulation runtime to extend beyond a single job allocation and facilitating recovery from hardware failure. Yet, as machines grow in size and in complexity, traditional approaches to checkpoint restart are becoming prohibitive. Current methods store a subset of the application’s state and exploit the memory hierarchy in the machine. However, as the energy cost of data movement continues to dominate, further reductions in checkpoint size are needed. Lossy compression, which can significantly reduce checkpoint sizes, offers a potential to reduce computational cost in checkpoint restart. This article investigates the use of numerical properties of partial differential equation (PDE) simulations, such as bounds on the truncation error, to evaluate the feasibility of using lossy compression in checkpointing PDE simulations. Restart from a checkpoint with lossy compression is considered for a fail-stop error in two time-dependent HPC application codes: PlasComCM and Nek5000. Results show that error in application variables due to a restart from a lossy compressed checkpoint can be masked by the numerical error in the discretization, leading to increased efficiency in checkpoint restart without influencing overall accuracy in the simulation.
- Research Organization:
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Sponsoring Organization:
- Air Force Research Laboratory (AFRL), Air Force Office of Scientific Research (AFOSR); National Science Foundation (NSF); USDOE; USDOE National Nuclear Security Administration (NNSA)
- Grant/Contract Number:
- AC02-06CH11357; NA0002374
- OSTI ID:
- 1425688
- Alternate ID(s):
- OSTI ID: 1510066
- Journal Information:
- International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications Journal Issue: 2 Vol. 33; ISSN 1094-3420
- Publisher:
- SAGE PublicationsCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Exascale Computing Technology Challenges
|
book | January 2010 |
Collective I/O Tuning Using Analytical and Machine Learning Models
|
conference | September 2015 |
High Throughput Compression of Double-Precision Floating-Point Data
|
conference | March 2007 |
On the Viability of Compression for Reducing the Overheads of Checkpoint/Restart-Based Fault Tolerance
|
conference | September 2012 |
Exploration of Lossy Compression for Application-Level Checkpoint/Restart
|
conference | May 2015 |
Fast Error-Bounded Lossy HPC Data Compression with SZ
|
conference | May 2016 |
Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization
|
conference | May 2017 |
On the role of burst buffers in leadership-class storage systems
|
conference | April 2012 |
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
|
conference | November 2010 |
NUMARCK: Machine Learning Algorithm for Resiliency and Checkpointing
|
conference | November 2014 |
Fast and Efficient Compression of Floating-Point Data
|
journal | September 2006 |
Fixed-Rate Compressed Floating-Point Arrays
|
journal | December 2014 |
FTI: high performance fault tolerance interface for hybrid systems
|
conference | January 2011 |
Assessing the effects of data compression in simulations using physically motivated metrics
|
conference | January 2013 |
A methodology for evaluating the impact of data compression on climate simulation data
|
conference | January 2014 |
A Multiplatform Study of I/O Behavior on Petascale Supercomputers
|
conference | January 2015 |
Toward Exascale Resilience
|
journal | September 2009 |
Similar Records
Accelerating Relative-error Bounded Lossy Compression for HPC datasets with Precomputation-Based Mechanisms