Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Ultrafast Error-bounded Lossy Compression for Scientific Datasets

Conference ·

Today's scientific high-performance computing applications and advanced instruments are producing vast volumes of data across a wide range of domains, which impose a serious burden on data transfer and storage. Error-bounded lossy compression has been developed and widely used in the scientific community because it not only can significantly reduce the data volumes but also can strictly control the data distortion based on the user-specified error bound. Existing lossy compressors, however, cannot offer ultrafast compression speed, which is highly demanded by numerous applications or use cases (such as in-memory compression and online instrument data compression). In this paper, we propose a novel ultrafast error-bounded lossy compressor that can obtain fairly high compression performance on both CPUs and GPUs and with reasonably high compression ratios. The key contributions are threefold. (1) We propose a generic error-bounded lossy compression framework---called SZx---that achieves ultrafast performance through its novel design comprising only lightweight operations such as bitwise and addition/subtraction operations, while still keeping a high compression ratio. (2) We implement SZx on both CPUs and GPUs and optimize the performance according to their architectures. (3) We perform a comprehensive evaluation with six real-world production-level scientific datasets on both CPUs and GPUs. Experiments show that SZx is 2~16x faster than the second-fastest existing error-bounded lossy compressor (either SZ or ZFP) on CPUs and GPUs, with respect to both compression and decompression.

Research Organization:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Organization:
National Science Foundation (NSF); USDOE Exascale Computing Project (ECP); USDOE Office of Science - Office of Advanced Scientific Computing Research (ASCR); USDOE Office of Science (SC)
DOE Contract Number:
AC02-06CH11357
OSTI ID:
1903841
Resource Relation:
Conference: 31st International ACM Symposium on High Performance Parallel and Distributed Computing, 06/27/22 - 07/01/22, Minneapolis, MN, US
Country of Publication:
United States
Language:
English

References (19)

Multilevel techniques for compression and reduction of scientific data—the univariate case journal November 2018
Use cases of lossy compression for floating-point data in scientific data sets journal May 2019
Fast Error-Bounded Lossy HPC Data Compression with SZ conference May 2016
Efficient Lossy Compression for Scientific Data Based on Pointwise Relative Error Bound journal February 2019
HACC: extreme scaling and performance across diverse architectures journal December 2016
On the Viability of Compression for Reducing the Overheads of Checkpoint/Restart-Based Fault Tolerance conference September 2012
The Community Earth System Model (CESM) Large Ensemble Project: A Community Resource for Studying Climate Change in the Presence of Internal Climate Variability journal August 2015
Fixed-Rate Compressed Floating-Point Arrays journal December 2014
Fast and Efficient Compression of Floating-Point Data journal September 2006
Demystifying automata processing: GPUs, FPGAs or Micron's AP? conference January 2017
Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization conference May 2017
Z-checker: A framework for assessing lossy compression of scientific data journal November 2017
Exploring different automata representations for efficient regular expression matching on GPUs journal August 2013
Topology-aware optimizations for multi-GPU ptychographic image reconstruction
  • Yu, Xiaodong; Bicer, Tekin; Kettimuthu, Rajkumar
  • ICS '21: 2021 International Conference on Supercomputing, Proceedings of the ACM International Conference on Supercomputing https://doi.org/10.1145/3447818.3460380
conference June 2021
cuZ-Checker: A GPU-Based Ultra-Fast Assessment System for Lossy Compressions conference September 2021
Scalable and accurate multi-GPU-based image reconstruction of large-scale ptychography data journal March 2022
An Enhanced Image Reconstruction Tool for Computed Tomography on GPUs conference May 2017
GPU-Based Iterative Medical CT Image Reconstructions journal March 2018
Significantly Improving Lossy Compression for HPC Datasets with Second-Order Prediction and Parameter Optimization
  • Zhao, Kai; Di, Sheng; Liang, Xin
  • HPDC '20: The 29th International Symposium on High-Performance Parallel and Distributed Computing, Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing https://doi.org/10.1145/3369583.3392688
conference June 2020