skip to main content

DOE PAGESDOE PAGES

Title: The compression–error trade-off for large gridded data sets

The netCDF-4 format is widely used for large gridded scientific data sets and includes several compression methods: lossy linear scaling and the non-lossy deflate and shuffle algorithms. Many multidimensional geoscientific data sets exhibit considerable variation over one or several spatial dimensions (e.g., vertically) with less variation in the remaining dimensions (e.g., horizontally). On such data sets, linear scaling with a single pair of scale and offset parameters often entails considerable loss of precision. We introduce an alternative compression method called "layer-packing" that simultaneously exploits lossy linear scaling and lossless compression. Layer-packing stores arrays (instead of a scalar pair) of scale and offset parameters. An implementation of this method is compared with lossless compression, storing data at fixed relative precision (bit-grooming) and scalar linear packing in terms of compression ratio, accuracy and speed. When viewed as a trade-off between compression and error, layer-packing yields similar results to bit-grooming (storing between 3 and 4 significant figures). Bit-grooming and layer-packing offer significantly better control of precision than scalar linear packing. Relative performance, in terms of compression and errors, of bit-groomed and layer-packed data were strongly predicted by the entropy of the exponent array, and lossless compression was well predicted by entropy of themore » original data array. Layer-packed data files must be "unpacked" to be readily usable. The compression and precision characteristics make layer-packing a competitive archive format for many scientific data sets.« less
Authors:
ORCiD logo [1] ; ORCiD logo [2]
  1. Univ. of Melbourne, Melbourne (Australia)
  2. Univ. of California, Irvine, CA (United States)
Publication Date:
Grant/Contract Number:
SC0012998; SC0012998.
Type:
Published Article
Journal Name:
Geoscientific Model Development (Online)
Additional Journal Information:
Journal Name: Geoscientific Model Development (Online); Journal Volume: 10; Journal Issue: 1; Journal ID: ISSN 1991-9603
Publisher:
European Geosciences Union
Research Org:
Univ. of California, Irvine, CA (United States)
Sponsoring Org:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23)
Country of Publication:
United States
Language:
English
Subject:
58 GEOSCIENCES; 97 MATHEMATICS AND COMPUTING
OSTI Identifier:
1341345
Alternate Identifier(s):
OSTI ID: 1367178

Silver, Jeremy D., and Zender, Charles S.. The compression–error trade-off for large gridded data sets. United States: N. p., Web. doi:10.5194/gmd-10-413-2017.
Silver, Jeremy D., & Zender, Charles S.. The compression–error trade-off for large gridded data sets. United States. doi:10.5194/gmd-10-413-2017.
Silver, Jeremy D., and Zender, Charles S.. 2017. "The compression–error trade-off for large gridded data sets". United States. doi:10.5194/gmd-10-413-2017.
@article{osti_1341345,
title = {The compression–error trade-off for large gridded data sets},
author = {Silver, Jeremy D. and Zender, Charles S.},
abstractNote = {The netCDF-4 format is widely used for large gridded scientific data sets and includes several compression methods: lossy linear scaling and the non-lossy deflate and shuffle algorithms. Many multidimensional geoscientific data sets exhibit considerable variation over one or several spatial dimensions (e.g., vertically) with less variation in the remaining dimensions (e.g., horizontally). On such data sets, linear scaling with a single pair of scale and offset parameters often entails considerable loss of precision. We introduce an alternative compression method called "layer-packing" that simultaneously exploits lossy linear scaling and lossless compression. Layer-packing stores arrays (instead of a scalar pair) of scale and offset parameters. An implementation of this method is compared with lossless compression, storing data at fixed relative precision (bit-grooming) and scalar linear packing in terms of compression ratio, accuracy and speed. When viewed as a trade-off between compression and error, layer-packing yields similar results to bit-grooming (storing between 3 and 4 significant figures). Bit-grooming and layer-packing offer significantly better control of precision than scalar linear packing. Relative performance, in terms of compression and errors, of bit-groomed and layer-packed data were strongly predicted by the entropy of the exponent array, and lossless compression was well predicted by entropy of the original data array. Layer-packed data files must be "unpacked" to be readily usable. The compression and precision characteristics make layer-packing a competitive archive format for many scientific data sets.},
doi = {10.5194/gmd-10-413-2017},
journal = {Geoscientific Model Development (Online)},
number = 1,
volume = 10,
place = {United States},
year = {2017},
month = {1}
}