DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Use cases of lossy compression for floating-point data in scientific data sets

Journal Article · · International Journal of High Performance Computing Applications
ORCiD logo [1];  [2];  [3];  [3];  [4];  [5];  [6];  [7];  [2];  [7]
  1. Argonne National Lab. (ANL), Lemont, IL (United States); Univ. of Illinois, Urbana-Champaign, IL (United States)
  2. Argonne National Lab. (ANL), Lemont, IL (United States)
  3. Univ. of California, Riverside, CA (United States)
  4. Northwestern Univ., Evanston, IL (United States)
  5. Univ. of Alabama, Tuscaloosa, AL (United States)
  6. SLAC National Accelerator Lab., Menlo Park, CA (United States)
  7. Univ. of Chicago, IL (United States)

Architectural and technological trends of systems used for scientific computing call for a significant reduction of scientific data sets that are composed mainly of floating-point data. Here, this article surveys and presents experimental results of currently identified use cases of generic lossy compression to address the different limitations of scientific computing systems. The article shows from a collection of experiments run on parallel systems of a leadership facility that lossy data compression not only can reduce the footprint of scientific data sets on storage but also can reduce I/O and checkpoint/restart times, accelerate computation, and even allow significantly larger problems to be run than without lossy compression. In conclusion, these results suggest that lossy compression will become an important technology in many aspects of high performance scientific computing. Because the constraints for each use case are different and often conflicting, this collection of results also indicates the need for more specialization of the compression pipelines.

Research Organization:
SLAC National Accelerator Laboratory (SLAC), Menlo Park, CA (United States); Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE Exascale Computing Project; National Science Foundation (NSF)
Grant/Contract Number:
AC02-76SF00515; AC02-06CH11357
Alternate ID(s):
OSTI ID: 1575218
Journal Information:
International Journal of High Performance Computing Applications, Vol. 33, Issue 6; ISSN 1094-3420
SAGECopyright Statement
Country of Publication:
United States
Citation Metrics:
Cited by: 52 works
Citation information provided by
Web of Science

References (44)

Toward a Multi-method Approach: Lossy Data Compression for Climate Simulation Data book January 2017
PaSTRI: Error-Bounded Lossy Compression for Two-Electron Integrals in Quantum Chemistry conference September 2018
The Community Earth System Model: A Framework for Collaborative Research journal September 2013
Detailed Modeling, Design, and Evaluation of a Scalable Multi-level Checkpointing System report April 2010
Selenium single-wavelength anomalous diffraction de novo phasing using an X-ray-free electron laser journal November 2016
Enabling Near Real-Time Remote Search for Fast Transient Events with Lossy Data Compression journal January 2017
General atomic and molecular electronic structure system journal November 1993
Evaluating lossy data compression on climate simulation data within a large ensemble journal January 2016
Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization conference May 2017
Data systems for the Linac coherent light source journal January 2017
Spatio-Temporal Just Noticeable Distortion Profile for Grey Scale Image/Video in DCT Domain journal March 2009
A first order approximation to the optimum checkpoint interval journal September 1974
Wavefield compression for adjoint methods in full-waveform inversion journal November 2016
Improving performance of iterative methods by lossy checkponting conference January 2018
Exploration of Lossy Compression for Application-Level Checkpoint/Restart conference May 2015
Data compression in the petascale astronomy era: A GERLUMPH case study journal September 2015
Fast Error-Bounded Lossy HPC Data Compression with SZ conference May 2016
Lossless compression of high-volume numerical data from simulations conference January 2000
FTI: high performance fault tolerance interface for hybrid systems
  • Bautista-Gomez, Leonardo; Tsuboi, Seiji; Komatitsch, Dimitri
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
conference January 2011
Efficient TPC data compression by track and cluster modeling
  • Röhrich, Dieter; Vestbø, Anders
  • Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, Vol. 566, Issue 2
journal October 2006
Exploring the feasibility of lossy compression for PDE simulations journal November 2017
Fixed-Rate Compressed Floating-Point Arrays journal December 2014
Lossy compression of TPC data and trajectory tracking efficiency for the ALICE experiment
  • Nicolaucig, A.; Ivanov, M.; Mattavelli, M.
  • Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, Vol. 500, Issue 1-3
journal March 2003
Improving I/O Forwarding Throughput with Data Compression conference September 2011
HACC: extreme scaling and performance across diverse architectures journal December 2016
Optimizing Lossy Compression with Adjacent Snapshots for N-body Simulation Data conference December 2018
Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets conference December 2018
Fast and Efficient Compression of Floating-Point Data journal September 2006
McrEngine: A Scalable Checkpointing System Using Data-Aware Aggregation and Compression journal January 2013
The JPEG still picture compression standard journal January 1992
Scheduling the I/O of HPC Applications Under Congestion conference May 2015
The impact of JPEG2000 lossy compression on the scientific quality of radio astronomy imagery journal October 2014
Se-SAD serial femtosecond crystallography datasets from selenobiotinyl-streptavidin journal April 2017
Multilevel techniques for compression and reduction of scientific data—the univariate case journal November 2018
The History of Storage Systems journal May 2012
In-depth exploration of single-snapshot lossy compression techniques for N-body simulations conference December 2017
Data Reduction Techniques for Simulation, Visualization and Data Analysis: Survey on Scientific Data Reduction Techniques journal March 2018
18.9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios
  • Fu, Haohuan; Yin, Wanwang; Yang, Guangwen
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17
conference January 2017
Compression of interferometric radio-astronomical data journal November 2016
The Paris Climate Agreement and future sea-level rise from Antarctica journal May 2021
The JPEG still picture compression standard journal April 1991
The impact of JPEG2000 lossy compression on the scientific quality of radio astronomy imagery preprint January 2014
Data Compression in the Petascale Astronomy Era: a GERLUMPH case study text January 2015
Improving Performance of Iterative Methods by Lossy Checkponting text January 2018

Cited By (2)

Significantly improving lossy compression quality based on an optimized hybrid prediction model
  • Liang, Xin; Di, Sheng; Li, Sihuan
  • SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
conference November 2019
Compression Challenges in Large Scale Partial Differential Equation Solvers journal September 2019

Figures / Tables (18)