Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

The compression–error trade-off for large gridded data sets

Journal Article · · Geoscientific Model Development (Online)

The netCDF-4 format is widely used for large gridded scientific data sets and includes several compression methods: lossy linear scaling and the non-lossy deflate and shuffle algorithms. Many multidimensional geoscientific data sets exhibit considerable variation over one or several spatial dimensions (e.g., vertically) with less variation in the remaining dimensions (e.g., horizontally). On such data sets, linear scaling with a single pair of scale and offset parameters often entails considerable loss of precision. We introduce an alternative compression method called "layer-packing" that simultaneously exploits lossy linear scaling and lossless compression. Layer-packing stores arrays (instead of a scalar pair) of scale and offset parameters. An implementation of this method is compared with lossless compression, storing data at fixed relative precision (bit-grooming) and scalar linear packing in terms of compression ratio, accuracy and speed.

When viewed as a trade-off between compression and error, layer-packing yields similar results to bit-grooming (storing between 3 and 4 significant figures). Bit-grooming and layer-packing offer significantly better control of precision than scalar linear packing. Relative performance, in terms of compression and errors, of bit-groomed and layer-packed data were strongly predicted by the entropy of the exponent array, and lossless compression was well predicted by entropy of the original data array. Layer-packed data files must be "unpacked" to be readily usable. The compression and precision characteristics make layer-packing a competitive archive format for many scientific data sets.

Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23)
Grant/Contract Number:
SC0012998
OSTI ID:
1341345
Journal Information:
Geoscientific Model Development (Online), Journal Name: Geoscientific Model Development (Online) Journal Issue: 1 Vol. 10; ISSN 1991-9603
Publisher:
Copernicus Publications, EGUCopyright Statement
Country of Publication:
Germany
Language:
English

References (12)

Rate-Distortion Theory book January 2003
The ERA-Interim reanalysis: configuration and performance of the data assimilation system journal April 2011
Analysis of self-describing gridded geoscience data with netCDF Operators (NCO) journal October 2008
The JPEG 2000 still image compression standard journal January 2001
Fast Error-Bounded Lossy HPC Data Compression with SZ conference May 2016
A Method for the Construction of Minimum-Redundancy Codes journal September 1952
Revisiting wavelet compression for large-scale climate data using JPEG 2000 and ensuring data precision conference October 2011
A universal algorithm for sequential data compression journal May 1977
Compression of individual sequences via variable-rate coding journal September 1978
What every computer scientist should know about floating-point arithmetic journal March 1991
A methodology for evaluating the impact of data compression on climate simulation data
  • Baker, Allison H.; Xu, Haiying; Dennis, John M.
  • Proceedings of the 23rd international symposium on High-performance parallel and distributed computing - HPDC '14 https://doi.org/10.1145/2600212.2600217
conference January 2014
Bit Grooming: statistically accurate precision-preserving quantization with compression, evaluated in the netCDF Operators (NCO, v4.4.8+) journal January 2016

Similar Records

Bit Grooming: statistically accurate precision-preserving quantization with compression, evaluated in the netCDF Operators (NCO, v4.4.8+)
Journal Article · Mon Sep 19 00:00:00 EDT 2016 · Geoscientific Model Development (Online) · OSTI ID:1328486

Related Subjects