Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Exploring the feasibility of lossy compression for PDE simulations

Journal Article · · International Journal of High Performance Computing Applications
 [1];  [2];  [3];  [3];  [3]
  1. Holcombe Department of Electrical and Computer Engineering, Clemson University, Clemson, SC, USA
  2. Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, USA
  3. Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA

Checkpoint restart plays an important role in high-performance computing (HPC) applications, allowing simulation runtime to extend beyond a single job allocation and facilitating recovery from hardware failure. Yet, as machines grow in size and in complexity, traditional approaches to checkpoint restart are becoming prohibitive. Current methods store a subset of the application’s state and exploit the memory hierarchy in the machine. However, as the energy cost of data movement continues to dominate, further reductions in checkpoint size are needed. Lossy compression, which can significantly reduce checkpoint sizes, offers a potential to reduce computational cost in checkpoint restart. This article investigates the use of numerical properties of partial differential equation (PDE) simulations, such as bounds on the truncation error, to evaluate the feasibility of using lossy compression in checkpointing PDE simulations. Restart from a checkpoint with lossy compression is considered for a fail-stop error in two time-dependent HPC application codes: PlasComCM and Nek5000. Results show that error in application variables due to a restart from a lossy compressed checkpoint can be masked by the numerical error in the discretization, leading to increased efficiency in checkpoint restart without influencing overall accuracy in the simulation.

Research Organization:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Organization:
Air Force Research Laboratory (AFRL), Air Force Office of Scientific Research (AFOSR); National Science Foundation (NSF); USDOE; USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC02-06CH11357; NA0002374
OSTI ID:
1425688
Alternate ID(s):
OSTI ID: 1510066
Journal Information:
International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications Journal Issue: 2 Vol. 33; ISSN 1094-3420
Publisher:
SAGE PublicationsCopyright Statement
Country of Publication:
United States
Language:
English

References (17)

Exascale Computing Technology Challenges book January 2010
Collective I/O Tuning Using Analytical and Machine Learning Models conference September 2015
High Throughput Compression of Double-Precision Floating-Point Data conference March 2007
On the Viability of Compression for Reducing the Overheads of Checkpoint/Restart-Based Fault Tolerance conference September 2012
Exploration of Lossy Compression for Application-Level Checkpoint/Restart conference May 2015
Fast Error-Bounded Lossy HPC Data Compression with SZ conference May 2016
Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization conference May 2017
On the role of burst buffers in leadership-class storage systems conference April 2012
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
  • Moody, Adam; Bronevetsky, Greg; Mohror, Kathryn
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.18
conference November 2010
NUMARCK: Machine Learning Algorithm for Resiliency and Checkpointing
  • Chen, Zhengzhang; Son, Seung Woo; Hendrix, William
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.65
conference November 2014
Fast and Efficient Compression of Floating-Point Data journal September 2006
Fixed-Rate Compressed Floating-Point Arrays journal December 2014
FTI: high performance fault tolerance interface for hybrid systems
  • Bautista-Gomez, Leonardo; Tsuboi, Seiji; Komatitsch, Dimitri
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063427
conference January 2011
Assessing the effects of data compression in simulations using physically motivated metrics
  • Laney, Daniel; Langer, Steven; Weber, Christopher
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503283
conference January 2013
A methodology for evaluating the impact of data compression on climate simulation data
  • Baker, Allison H.; Xu, Haiying; Dennis, John M.
  • Proceedings of the 23rd international symposium on High-performance parallel and distributed computing - HPDC '14 https://doi.org/10.1145/2600212.2600217
conference January 2014
A Multiplatform Study of I/O Behavior on Petascale Supercomputers
  • Luu, Huong; Winslett, Marianne; Gropp, William
  • Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '15 https://doi.org/10.1145/2749246.2749269
conference January 2015
Toward Exascale Resilience journal September 2009

Similar Records

Performance Optimization for Relative-Error-Bounded Lossy Compression on Scientific Data
Journal Article · Sun Feb 09 19:00:00 EST 2020 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1603491

Accelerating Relative-error Bounded Lossy Compression for HPC datasets with Precomputation-Based Mechanisms
Conference · Mon May 20 00:00:00 EDT 2019 · OSTI ID:1515554