skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Z-checker: A framework for assessing lossy compression of scientific data

Abstract

Because of the vast volume of data being produced by today's scientific simulations and experiments, lossy data compressor allowing user-controlled loss of accuracy during the compression is a relevant solution for significantly reducing the data size. However, lossy compressor developers and users are missing a tool to explore the features of scientific data sets and understand the data alteration after compression in a systematic and reliable way. To address this gap, we have designed and implemented a generic framework called Z-checker. On the one hand, Z-checker combines a battery of data analysis components for data compression. On the other hand, Z-checker is implemented as an open-source community tool to which users and developers can contribute and add new analysis components based on their additional analysis demands. In this study, we present a survey of existing lossy compressors. Then, we describe the design framework of Z-checker, in which we integrated evaluation metrics proposed in prior work as well as other analysis tools. Specifically, for lossy compressor developers, Z-checker can be used to characterize critical properties (such as entropy, distribution, power spectrum, principal component analysis, and autocorrelation) of any data set to improve compression strategies. For lossy compression users, Z-checker can detectmore » the compression quality (compression ratio and bit rate) and provide various global distortion analysis comparing the original data with the decompressed data (peak signal-to-noise ratio, normalized mean squared error, rate-distortion, rate-compression error, spectral, distribution, and derivatives) and statistical analysis of the compression error (maximum, minimum, and average error; autocorrelation; and distribution of errors). Z-checker can perform the analysis with either coarse granularity (throughout the whole data set) or fine granularity (by user-defined blocks), such that the users and developers can select the best fit, adaptive compressors for different parts of the data set. Z-checker features a visualization interface displaying all analysis results in addition to some basic views of the data sets such as time series. To the best of our knowledge, Z-checker is the first tool designed to assess lossy compression comprehensively for scientific data sets.« less

Authors:
 [1];  [2];  [2];  [3];  [4]
  1. Department of Computer Science and Engineering, University of California, Riverside, CA, USA
  2. Division of Computer Science and Mathematics, Argonne National Laboratory, Lemont, IL, USA
  3. Department of Computer Science and Engineering, University of California, Riverside, CA, USA, Beijing University of Technology, Beijing, China
  4. Division of Computer Science and Mathematics, Argonne National Laboratory, Lemont, IL, USA, Parallel Computing Institute, University of Illinois Urbana–Champaign, Champaign, IL, USA
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE; National Science Foundation (NSF)
OSTI Identifier:
1437773
Alternate Identifier(s):
OSTI ID: 1510019
Grant/Contract Number:  
AC02-06CH11357
Resource Type:
Published Article
Journal Name:
International Journal of High Performance Computing Applications
Additional Journal Information:
Journal Name: International Journal of High Performance Computing Applications Journal Volume: 33 Journal Issue: 2; Journal ID: ISSN 1094-3420
Publisher:
SAGE Publications
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; framework; lossy compression; assessment tool; data analytics; scientific data; visualization

Citation Formats

Tao, Dingwen, Di, Sheng, Guo, Hanqi, Chen, Zizhong, and Cappello, Franck. Z-checker: A framework for assessing lossy compression of scientific data. United States: N. p., 2017. Web. doi:10.1177/1094342017737147.
Tao, Dingwen, Di, Sheng, Guo, Hanqi, Chen, Zizhong, & Cappello, Franck. Z-checker: A framework for assessing lossy compression of scientific data. United States. doi:10.1177/1094342017737147.
Tao, Dingwen, Di, Sheng, Guo, Hanqi, Chen, Zizhong, and Cappello, Franck. Wed . "Z-checker: A framework for assessing lossy compression of scientific data". United States. doi:10.1177/1094342017737147.
@article{osti_1437773,
title = {Z-checker: A framework for assessing lossy compression of scientific data},
author = {Tao, Dingwen and Di, Sheng and Guo, Hanqi and Chen, Zizhong and Cappello, Franck},
abstractNote = {Because of the vast volume of data being produced by today's scientific simulations and experiments, lossy data compressor allowing user-controlled loss of accuracy during the compression is a relevant solution for significantly reducing the data size. However, lossy compressor developers and users are missing a tool to explore the features of scientific data sets and understand the data alteration after compression in a systematic and reliable way. To address this gap, we have designed and implemented a generic framework called Z-checker. On the one hand, Z-checker combines a battery of data analysis components for data compression. On the other hand, Z-checker is implemented as an open-source community tool to which users and developers can contribute and add new analysis components based on their additional analysis demands. In this study, we present a survey of existing lossy compressors. Then, we describe the design framework of Z-checker, in which we integrated evaluation metrics proposed in prior work as well as other analysis tools. Specifically, for lossy compressor developers, Z-checker can be used to characterize critical properties (such as entropy, distribution, power spectrum, principal component analysis, and autocorrelation) of any data set to improve compression strategies. For lossy compression users, Z-checker can detect the compression quality (compression ratio and bit rate) and provide various global distortion analysis comparing the original data with the decompressed data (peak signal-to-noise ratio, normalized mean squared error, rate-distortion, rate-compression error, spectral, distribution, and derivatives) and statistical analysis of the compression error (maximum, minimum, and average error; autocorrelation; and distribution of errors). Z-checker can perform the analysis with either coarse granularity (throughout the whole data set) or fine granularity (by user-defined blocks), such that the users and developers can select the best fit, adaptive compressors for different parts of the data set. Z-checker features a visualization interface displaying all analysis results in addition to some basic views of the data sets such as time series. To the best of our knowledge, Z-checker is the first tool designed to assess lossy compression comprehensively for scientific data sets.},
doi = {10.1177/1094342017737147},
journal = {International Journal of High Performance Computing Applications},
number = 2,
volume = 33,
place = {United States},
year = {2017},
month = {11}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
DOI: 10.1177/1094342017737147

Citation Metrics:
Cited by: 3 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Exploration of Lossy Compression for Application-Level Checkpoint/Restart
conference, May 2015

  • Sasaki, Naoto; Sato, Kento; Endo, Toshio
  • 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2015.67

Fast Error-Bounded Lossy HPC Data Compression with SZ
conference, May 2016

  • Di, Sheng; Cappello, Franck
  • 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2016.11

Fast Lossless Compression of Scientific Floating-Point Data
conference, January 2006

  • Ratanaworabhan, P.; Ke, J.; Burtscher, M.
  • Data Compression Conference (DCC'06)
  • DOI: 10.1109/DCC.2006.35

Fixed-Rate Compressed Floating-Point Arrays
journal, December 2014

  • Lindstrom, Peter
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 20, Issue 12
  • DOI: 10.1109/TVCG.2014.2346458

HACC: extreme scaling and performance across diverse architectures
journal, December 2016

  • Habib, Salman; Insley, Joe; Daniel, David
  • Communications of the ACM, Vol. 60, Issue 1
  • DOI: 10.1145/3015569

A study of the characteristics of white noise using the empirical mode decomposition method
journal, June 2004

  • Wu, Zhaohua; Huang, Norden E.
  • Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, Vol. 460, Issue 2046
  • DOI: 10.1098/rspa.2003.1221

ISABELA for effective in situ compression of scientific data: ISABELA FOR EFFECTIVE
journal, July 2012

  • Lakshminarasimhan, Sriram; Shah, Neil; Ethier, Stephane
  • Concurrency and Computation: Practice and Experience, Vol. 25, Issue 4
  • DOI: 10.1002/cpe.2887

Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization
conference, May 2017

  • Tao, Dingwen; Di, Sheng; Chen, Zizhong
  • 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2017.115

The JPEG still picture compression standard
journal, January 1992

  • Wallace, G. K.
  • IEEE Transactions on Consumer Electronics, Vol. 38, Issue 1
  • DOI: 10.1109/30.125072

Assessing the Effects of Data Compression in Simulations Using Physically Motivated Metrics
journal, January 2014

  • Laney, Daniel; Langer, Steven; Weber, Christopher
  • Scientific Programming, Vol. 22, Issue 2
  • DOI: 10.1155/2014/835419

A methodology for evaluating the impact of data compression on climate simulation data
conference, January 2014

  • Baker, Allison H.; Xu, Haiying; Dennis, John M.
  • Proceedings of the 23rd international symposium on High-performance parallel and distributed computing - HPDC '14
  • DOI: 10.1145/2600212.2600217

The Earth System Grid: Supporting the Next Generation of Climate Modeling Research
journal, March 2005


A universal algorithm for sequential data compression
journal, May 1977


NUMARCK: Machine Learning Algorithm for Resiliency and Checkpointing
conference, November 2014

  • Chen, Zhengzhang; Son, Seung Woo; Hendrix, William
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2014.65

Advanced Photon Source
journal, March 2016


Industrial-era global ocean heat uptake doubles in recent decades
journal, January 2016

  • Gleckler, Peter J.; Durack, Paul J.; Stouffer, Ronald J.
  • Nature Climate Change, Vol. 6, Issue 4
  • DOI: 10.1038/nclimate2915

A Method for the Construction of Minimum-Redundancy Codes
journal, September 1952