Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Optimizing Error-Bounded Lossy Compression for Scientific Data With Diverse Constraints

Journal Article · · IEEE Transactions on Parallel and Distributed Systems

Vast volumes of data are produced by today's scientific simulations and advanced instruments. These data cannot be stored and transferred efficiently because of limited I/O bandwidth, network speed, and storage capacity. Error-bounded lossy compression can be an effective method for addressing these issues: not only can it significantly reduce data size, but it can also control the data distortion based on user-defined error bounds. In practice, many scientific applications have specific requirements or constraints for lossy compression, in order to guarantee that the reconstructed data are valid for post hoc analysis. For example, some datasets contain irrelevant data that should be isolated in particular and users often have intuition regarding value ranges, geospatial regions, and other data subsets that are crucial for subsequent analysis. Existing state-of-the-art error-bounded lossy compressors, however, do not consider these constraints during compression, resulting in inferior compression ratios with respect to user's post hoc analysis, due to the fact that the data itself provides little or no value for post hoc analysis. In this work we address this issue by proposing an optimized framework that can preserve diverse constraints during the error-bounded lossy compression, e.g., cleaning the irrelevant data, efficiently preserving different precision for multiple value intervals, and allowing users to set diverse precision over both regular and irregular regions. We perform our evaluation on a supercomputer with up to 2,100 cores. Experiments with six real-world applications show that our proposed diverse constraints based error-bounded lossy compressor can obtain a higher visual quality or data fidelity on reconstructed data with the same or even higher compression ratios compared with the traditional state-of-the-art compressor SZ. Furthermore, our experiments also demonstrate very good scalability in compression performance compared with the I/O throughput of the parallel file system.

Research Organization:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); National Science Foundation (NSF); Exascale Computing Project (ECP)
Grant/Contract Number:
AC02-06CH11357
OSTI ID:
1908133
Journal Information:
IEEE Transactions on Parallel and Distributed Systems, Journal Name: IEEE Transactions on Parallel and Distributed Systems Journal Issue: 12 Vol. 33; ISSN 1045-9219
Publisher:
IEEECopyright Statement
Country of Publication:
United States
Language:
English

References (27)

Multilevel techniques for compression and reduction of scientific data—the univariate case journal November 2018
The evolution of large-scale structure in a universe dominated by cold dark matter journal May 1985
In-depth exploration of single-snapshot lossy compression techniques for N-body simulations conference December 2017
Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets conference December 2018
Fixed-PSNR Lossy Compression for Scientific Data conference September 2018
Fast Lossless Compression of Scientific Floating-Point Data conference January 2006
Optimizing Error-Bounded Lossy Compression for Scientific Data by Dynamic Spline Interpolation conference April 2021
Exploration of Lossy Compression for Application-Level Checkpoint/Restart conference May 2015
Fast Error-Bounded Lossy HPC Data Compression with SZ conference May 2016
Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization conference May 2017
FRaZ: A Generic High-Fidelity Fixed-Ratio Lossy Compression Framework for Scientific Floating-point Data conference May 2020
Efficient Encoding and Reconstruction of HPC Datasets for Checkpoint/Restart conference May 2019
FPC: A High-Speed Compressor for Double-Precision Floating-Point Data journal January 2009
Fast and Efficient Compression of Floating-Point Data journal September 2006
Fixed-Rate Compressed Floating-Point Arrays journal December 2014
Evaluating image quality measures to assess the impact of lossy data compression applied to climate simulation data journal June 2019
Error Analysis of ZFP Compression for Floating-Point Data journal January 2019
A methodology for evaluating the impact of data compression on climate simulation data
  • Baker, Allison H.; Xu, Haiying; Dennis, John M.
  • Proceedings of the 23rd international symposium on High-performance parallel and distributed computing - HPDC '14 https://doi.org/10.1145/2600212.2600217
conference January 2014
HACC: extreme scaling and performance across diverse architectures journal December 2016
Improving performance of iterative methods by lossy checkponting conference January 2018
Full-state quantum circuit simulation by using data compression
  • Wu, Xin-Chuan; Di, Sheng; Dasgupta, Emma Maitreyee
  • SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356155
conference November 2019
DeepSZ: A Novel Framework to Compress Deep Neural Networks by Using Error-Bounded Lossy Compression
  • Jin, Sian; Di, Sheng; Liang, Xin
  • HPDC '19: The 28th International Symposium on High-Performance Parallel and Distributed Computing, Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing https://doi.org/10.1145/3307681.3326608
conference June 2019
The Community Earth System Model (CESM) Large Ensemble Project: A Community Resource for Studying Climate Change in the Presence of Internal Climate Variability journal August 2015
Use cases of lossy compression for floating-point data in scientific data sets journal May 2019
In situ and in-transit analysis of cosmological simulations journal August 2016
Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF5 files journal September 2019
Bit Grooming: statistically accurate precision-preserving quantization with compression, evaluated in the netCDF Operators (NCO, v4.4.8+) journal January 2016

Similar Records

TopoSZ: Preserving Topology in Error-Bounded Lossy Compression
Journal Article · Sun Nov 05 23:00:00 EST 2023 · IEEE Transactions on Visualization and Computer Graphics · OSTI ID:2369454

Optimizing Error-Bounded Lossy Compression for Scientific Data by Dynamic Spline Interpolation
Conference · Thu Dec 31 23:00:00 EST 2020 · OSTI ID:1864145

SZ3: A Modular Framework for Composing Prediction-Based Error-Bounded Lossy Compressors
Journal Article · Tue Aug 23 00:00:00 EDT 2022 · IEEE Transactions on Big Data · OSTI ID:2370121