skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

This content will become publicly available on July 9, 2020

Title: Use cases of lossy compression for floating-point data in scientific data sets

Abstract

Architectural and technological trends of systems used for scientific computing call for a significant reduction of scientific data sets that are composed mainly of floating-point data. Here, this article surveys and presents experimental results of currently identified use cases of generic lossy compression to address the different limitations of scientific computing systems. The article shows from a collection of experiments run on parallel systems of a leadership facility that lossy data compression not only can reduce the footprint of scientific data sets on storage but also can reduce I/O and checkpoint/restart times, accelerate computation, and even allow significantly larger problems to be run than without lossy compression. In conclusion, these results suggest that lossy compression will become an important technology in many aspects of high performance scientific computing. Because the constraints for each use case are different and often conflicting, this collection of results also indicates the need for more specialization of the compression pipelines.

Authors:
ORCiD logo [1];  [2];  [3];  [3];  [4];  [5];  [6];  [7];  [2];  [7]
  1. Argonne National Lab. (ANL), Lemont, IL (United States); Univ. of Illinois, Urbana-Champaign, IL (United States)
  2. Argonne National Lab. (ANL), Lemont, IL (United States)
  3. Univ. of California, Riverside, CA (United States)
  4. Northwestern Univ., Evanston, IL (United States)
  5. Univ. of Alabama, Tuscaloosa, AL (United States)
  6. SLAC National Accelerator Lab., Menlo Park, CA (United States)
  7. Univ. of Chicago, IL (United States)
Publication Date:
Research Org.:
SLAC National Accelerator Lab., Menlo Park, CA (United States); Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21); USDOE Exascale Computing Project; National Science Foundation (NSF)
OSTI Identifier:
1560791
Alternate Identifier(s):
OSTI ID: 1575218
Grant/Contract Number:  
AC02-76SF00515; AC02-06CH11357
Resource Type:
Accepted Manuscript
Journal Name:
International Journal of High Performance Computing Applications
Additional Journal Information:
Journal Volume: 33; Journal Issue: 6; Journal ID: ISSN 1094-3420
Publisher:
SAGE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Lossy compression; floating-point data; scientific data set; applications; use cases

Citation Formats

Cappello, Franck, Di, Sheng, Li, Sihuan, Liang, Xin, Gok, Ali Murat, Tao, Dingwen, Yoon, Chun Hong, Wu, Xin-Chuan, Alexeev, Yuri, and Chong, Frederic T. Use cases of lossy compression for floating-point data in scientific data sets. United States: N. p., 2019. Web. doi:10.1177/1094342019853336.
Cappello, Franck, Di, Sheng, Li, Sihuan, Liang, Xin, Gok, Ali Murat, Tao, Dingwen, Yoon, Chun Hong, Wu, Xin-Chuan, Alexeev, Yuri, & Chong, Frederic T. Use cases of lossy compression for floating-point data in scientific data sets. United States. doi:10.1177/1094342019853336.
Cappello, Franck, Di, Sheng, Li, Sihuan, Liang, Xin, Gok, Ali Murat, Tao, Dingwen, Yoon, Chun Hong, Wu, Xin-Chuan, Alexeev, Yuri, and Chong, Frederic T. Tue . "Use cases of lossy compression for floating-point data in scientific data sets". United States. doi:10.1177/1094342019853336.
@article{osti_1560791,
title = {Use cases of lossy compression for floating-point data in scientific data sets},
author = {Cappello, Franck and Di, Sheng and Li, Sihuan and Liang, Xin and Gok, Ali Murat and Tao, Dingwen and Yoon, Chun Hong and Wu, Xin-Chuan and Alexeev, Yuri and Chong, Frederic T.},
abstractNote = {Architectural and technological trends of systems used for scientific computing call for a significant reduction of scientific data sets that are composed mainly of floating-point data. Here, this article surveys and presents experimental results of currently identified use cases of generic lossy compression to address the different limitations of scientific computing systems. The article shows from a collection of experiments run on parallel systems of a leadership facility that lossy data compression not only can reduce the footprint of scientific data sets on storage but also can reduce I/O and checkpoint/restart times, accelerate computation, and even allow significantly larger problems to be run than without lossy compression. In conclusion, these results suggest that lossy compression will become an important technology in many aspects of high performance scientific computing. Because the constraints for each use case are different and often conflicting, this collection of results also indicates the need for more specialization of the compression pipelines.},
doi = {10.1177/1094342019853336},
journal = {International Journal of High Performance Computing Applications},
number = 6,
volume = 33,
place = {United States},
year = {2019},
month = {7}
}

Journal Article:
Free Publicly Available Full Text
This content will become publicly available on July 9, 2020
Publisher's Version of Record

Save / Share:

Works referenced in this record:

PaSTRI: Error-Bounded Lossy Compression for Two-Electron Integrals in Quantum Chemistry
conference, September 2018

  • Gok, Ali Murat; Di, Sheng; Alexeev, Yuri
  • 2018 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2018.00013

The Community Earth System Model: A Framework for Collaborative Research
journal, September 2013

  • Hurrell, James W.; Holland, M. M.; Gent, P. R.
  • Bulletin of the American Meteorological Society, Vol. 94, Issue 9
  • DOI: 10.1175/BAMS-D-12-00121.1

Selenium single-wavelength anomalous diffraction de novo phasing using an X-ray-free electron laser
journal, November 2016

  • Hunter, Mark S.; Yoon, Chun Hong; DeMirci, Hasan
  • Nature Communications, Vol. 7, Issue 1
  • DOI: 10.1038/ncomms13388

Enabling Near Real-Time Remote Search for Fast Transient Events with Lossy Data Compression
journal, January 2017

  • Vohl, Dany; Pritchard, Tyler; Andreoni, Igor
  • Publications of the Astronomical Society of Australia, Vol. 34
  • DOI: 10.1017/pasa.2017.34

General atomic and molecular electronic structure system
journal, November 1993

  • Schmidt, Michael W.; Baldridge, Kim K.; Boatz, Jerry A.
  • Journal of Computational Chemistry, Vol. 14, Issue 11, p. 1347-1363
  • DOI: 10.1002/jcc.540141112

Evaluating lossy data compression on climate simulation data within a large ensemble
journal, January 2016

  • Baker, Allison H.; Hammerling, Dorit M.; Mickelson, Sheri A.
  • Geoscientific Model Development, Vol. 9, Issue 12
  • DOI: 10.5194/gmd-9-4381-2016

Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization
conference, May 2017

  • Tao, Dingwen; Di, Sheng; Chen, Zizhong
  • 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2017.115

Data systems for the Linac coherent light source
journal, January 2017


Spatio-Temporal Just Noticeable Distortion Profile for Grey Scale Image/Video in DCT Domain
journal, March 2009

  • Zhenyu Wei, ; Ngan, K. N.
  • IEEE Transactions on Circuits and Systems for Video Technology, Vol. 19, Issue 3
  • DOI: 10.1109/TCSVT.2009.2013518

A first order approximation to the optimum checkpoint interval
journal, September 1974


Wavefield compression for adjoint methods in full-waveform inversion
journal, November 2016

  • Boehm, Christian; Hanzich, Mauricio; de la Puente, Josep
  • GEOPHYSICS, Vol. 81, Issue 6
  • DOI: 10.1190/geo2015-0653.1

Improving performance of iterative methods by lossy checkponting
conference, January 2018

  • Tao, Dingwen; Di, Sheng; Liang, Xin
  • Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '18
  • DOI: 10.1145/3208040.3208050

Exploration of Lossy Compression for Application-Level Checkpoint/Restart
conference, May 2015

  • Sasaki, Naoto; Sato, Kento; Endo, Toshio
  • 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2015.67

Data compression in the petascale astronomy era: A GERLUMPH case study
journal, September 2015


Fast Error-Bounded Lossy HPC Data Compression with SZ
conference, May 2016

  • Di, Sheng; Cappello, Franck
  • 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2016.11

Lossless compression of high-volume numerical data from simulations
conference, January 2000

  • Engelson, V.; Fritzson, D.; Fritzson, P.
  • Proceedings DCC 2000. Data Compression Conference
  • DOI: 10.1109/DCC.2000.838221

FTI: high performance fault tolerance interface for hybrid systems
conference, January 2011

  • Bautista-Gomez, Leonardo; Tsuboi, Seiji; Komatitsch, Dimitri
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
  • DOI: 10.1145/2063384.2063427

Efficient TPC data compression by track and cluster modeling
journal, October 2006

  • Röhrich, Dieter; Vestbø, Anders
  • Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, Vol. 566, Issue 2
  • DOI: 10.1016/j.nima.2006.06.056

Exploring the feasibility of lossy compression for PDE simulations
journal, November 2017

  • Calhoun, Jon; Cappello, Franck; Olson, Luke N.
  • The International Journal of High Performance Computing Applications, Vol. 33, Issue 2
  • DOI: 10.1177/1094342018762036

Fixed-Rate Compressed Floating-Point Arrays
journal, December 2014

  • Lindstrom, Peter
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 20, Issue 12
  • DOI: 10.1109/TVCG.2014.2346458

Lossy compression of TPC data and trajectory tracking efficiency for the ALICE experiment
journal, March 2003

  • Nicolaucig, A.; Ivanov, M.; Mattavelli, M.
  • Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, Vol. 500, Issue 1-3
  • DOI: 10.1016/S0168-9002(03)00343-7

Improving I/O Forwarding Throughput with Data Compression
conference, September 2011

  • Welton, Benjamin; Kimpe, Dries; Cope, Jason
  • 2011 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2011.80

HACC: extreme scaling and performance across diverse architectures
journal, December 2016

  • Habib, Salman; Insley, Joe; Daniel, David
  • Communications of the ACM, Vol. 60, Issue 1
  • DOI: 10.1145/3015569

Optimizing Lossy Compression with Adjacent Snapshots for N-body Simulation Data
conference, December 2018


Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets
conference, December 2018


Fast and Efficient Compression of Floating-Point Data
journal, September 2006

  • Lindstrom, Peter; Isenburg, Martin
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 12, Issue 5
  • DOI: 10.1109/TVCG.2006.143

McrEngine: A Scalable Checkpointing System Using Data-Aware Aggregation and Compression
journal, January 2013

  • Islam, Tanzima Zerin; Mohror, Kathryn; Bagchi, Saurabh
  • Scientific Programming, Vol. 21, Issue 3-4
  • DOI: 10.1155/2013/341672

The JPEG still picture compression standard
journal, January 1992

  • Wallace, G. K.
  • IEEE Transactions on Consumer Electronics, Vol. 38, Issue 1
  • DOI: 10.1109/30.125072

Scheduling the I/O of HPC Applications Under Congestion
conference, May 2015

  • Gainaru, Ana; Aupy, Guillaume; Benoit, Anne
  • 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2015.116

The impact of JPEG2000 lossy compression on the scientific quality of radio astronomy imagery
journal, October 2014


Se-SAD serial femtosecond crystallography datasets from selenobiotinyl-streptavidin
journal, April 2017

  • Yoon, Chun Hong; DeMirci, Hasan; Sierra, Raymond G.
  • Scientific Data, Vol. 4, Issue 1
  • DOI: 10.1038/sdata.2017.55

Multilevel techniques for compression and reduction of scientific data—the univariate case
journal, November 2018

  • Ainsworth, Mark; Tugluk, Ozan; Whitney, Ben
  • Computing and Visualization in Science, Vol. 19, Issue 5-6
  • DOI: 10.1007/s00791-018-00303-9

The History of Storage Systems
journal, May 2012


In-depth exploration of single-snapshot lossy compression techniques for N-body simulations
conference, December 2017


Data Reduction Techniques for Simulation, Visualization and Data Analysis: Survey on Scientific Data Reduction Techniques
journal, March 2018

  • Li, S.; Marsaglia, N.; Garth, C.
  • Computer Graphics Forum, Vol. 37, Issue 6
  • DOI: 10.1111/cgf.13336

18.9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios
conference, January 2017

  • Fu, Haohuan; Yin, Wanwang; Yang, Guangwen
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17
  • DOI: 10.1145/3126908.3126910

Compression of interferometric radio-astronomical data
journal, November 2016