Stability-preserving Lossy Compression for Large-scale Partial Differential Equations
- ORNL
- Brown University, Providence, RI
- University of Oregon
- University of Kentucky
- University of Florida
- New Jersey Institute of Technology
Checkpoint/Restart (C/R) strategies are vital for fault tolerance in PDE-based scientific simulations, yet traditional checkpointing incurs significant I/O overhead. Lossy compression offers a scalable solution by reducing checkpoint data size, but conventional methods often lack control over physical invariants (e.g., energy), leading to instability such as oscillations or divergence in Partial Differential Equations (PDE) systems. This paper introduces a stability-preserving compression approach tailored for PDE simulations by explicitly controlling kinetic and potential energy perturbations to ensure stable restarts. Extensive experiments conducted across diverse PDE configurations demonstrate that our method maintains numerical stability with minimal error magnification—even across multiple checkpoint-restart cycles—outperforming state-of-the-art lossy compressors. Parallel evaluations on the Frontier supercomputer show up to 8.4× improvement in checkpoint write performance and 6.3× in read performance, while maintaining relative L2 errors ∼ 2e-6 throughout continued simulation. These results provide practical guidance for balancing compression accuracy, stability, and computational efficiency in large-scale PDE applications.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21); USDOE
- DOE Contract Number:
- AC05-00OR22725;
- OSTI ID:
- 3030485
- Resource Type:
- Conference paper/presentation
- Conference Information:
- International Conference for High Performance Computing, Networking, Storage and Analysis (SC '25) - St. Louis, Missouri, United States of America - 11/16/2025-11/21/2025
- Country of Publication:
- United States
- Language:
- English
Similar Records
Exploring the feasibility of lossy compression for PDE simulations
McrEngine: A Scalable Checkpointing System Using Data-Aware Aggregation and Compression
Journal Article
·
Sun Mar 11 20:00:00 EDT 2018
· International Journal of High Performance Computing Applications
·
OSTI ID:1425688
McrEngine: A Scalable Checkpointing System Using Data-Aware Aggregation and Compression
Journal Article
·
Mon Dec 31 19:00:00 EST 2012
· Scientific Programming
·
OSTI ID:1197891