skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Evaluating lossy data compression on climate simulation data within a large ensemble

Journal Article · · Geoscientific Model Development (Online)
 [1]; ORCiD logo [1];  [1];  [1];  [2];  [3];  [1]; ORCiD logo [4];  [4];  [5];  [5]; ORCiD logo [5];  [1];  [6]; ORCiD logo [7]
  1. National Center for Atmospheric Research, Boulder, CO (United States)
  2. ETH Zurich (Switzerland). Institute for Atmospheric and Climate Science
  3. Laboratoire des Sciences du Climat et l Environnement, Gif-sur-Yvette (France)
  4. Colorado State Univ., Fort Collins, CO (United States). Department of Electrical and Computer Engineering
  5. CNR-Institute of Atmospheric Pollution Research, Rende (Italy). Division of Rende, UNICAL-Polifunzional
  6. Univ. of Colorado, Boulder, CO (United States). Department of Oceanic and Atmospheric Sciences
  7. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States). Center for Applied Scientific Computing

High-resolution Earth system model simulations generate enormous data volumes, and retaining the data from these simulations often strains institutional storage resources. Further, these exceedingly large storage requirements negatively impact science objectives, for example, by forcing reductions in data output frequency, simulation length, or ensemble size. To lessen data volumes from the Community Earth System Model (CESM), we advocate the use of lossy data compression techniques. While lossy data compression does not exactly preserve the original data (as lossless compression does), lossy techniques have an advantage in terms of smaller storage requirements. To preserve the integrity of the scientific simulation data, the effects of lossy data compression on the original data should, at a minimum, not be statistically distinguishable from the natural variability of the climate system, and previous preliminary work with data from CESM has shown this goal to be attainable. However, to ultimately convince climate scientists that it is acceptable to use lossy data compression, we provide climate scientists with access to publicly available climate data that have undergone lossy data compression. In particular, we report on the results of a lossy data compression experiment with output from the CESM Large Ensemble (CESM-LE) Community Project, in which we challenge climate scientists to examine features of the data relevant to their interests, and attempt to identify which of the ensemble members have been compressed and reconstructed. We find that while detecting distinguishing features is certainly possible, the compression effects noticeable in these features are often unimportant or disappear in post-processing analyses. In addition, we perform several analyses that directly compare the original data to the reconstructed data to investigate the preservation, or lack thereof, of specific features critical to climate science. Overall, we conclude that applying lossy data compression to climate simulation data is both advantageous in terms of data reduction and generally acceptable in terms of effects on scientific results.

Research Organization:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
Grant/Contract Number:
AC52-07NA27344
OSTI ID:
1389988
Report Number(s):
LLNL-JRNL-691060
Journal Information:
Geoscientific Model Development (Online), Vol. 9, Issue 12; ISSN 1991-9603
Publisher:
European Geosciences UnionCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 28 works
Citation information provided by
Web of Science

References (39)

Revisiting wavelet compression for large-scale climate data using JPEG 2000 and ensuring data precision conference October 2011
Causal Discovery for Climate Research Using Graphical Models journal September 2012
Climate Model Intercomparisons: Preparing for the Next Phase journal March 2014
Bit Grooming: statistically accurate precision-preserving quantization with compression, evaluated in the netCDF Operators (NCO, v4.4.8+) journal January 2016
A new type of climate network based on probabilistic graphical models: Results of boreal winter versus summer: CLIMATE NETWORK BASED ON GRAPHICAL MODEL journal October 2012
Global and regional evolution of short-lived radiatively-active gases and aerosols in the Representative Concentration Pathways journal August 2011
Light-weight parallel Python tools for earth system modeling workflows conference October 2015
Spatio-temporal dynamics, patterns formation and turbulence in complex fluids due to electrohydrodynamics instabilities journal August 2011
Particle filtering for Gumbel-distributed daily maxima of methane and nitrous oxide: PARTICLE FILTERING FOR GUMBEL MAXIMA journal December 2012
Assessing the effects of data compression in simulations using physically motivated metrics
  • Laney, Daniel; Langer, Steven; Weber, Christopher
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503283
conference January 2013
Modelling Extremal Events for Insurance and Finance book January 1997
Parameter and Quantile Estimation for the Generalized Pareto Distribution journal August 1987
A High Performance Compression Method for Climate Data conference August 2014
Fast and Efficient Compression of Floating-Point Data journal September 2006
The Community Earth System Model (CESM) Large Ensemble Project: A Community Resource for Studying Climate Change in the Presence of Internal Climate Variability journal August 2015
Czip: A Fast Lossless Compression Algorithm for Climate Data journal March 2016
Causation, Prediction, and Search (2nd edition) book January 2001
Bayesian Spatial Modeling of Extreme Precipitation Return Levels journal September 2007
Probability weighted moments compared with some traditional techniques in estimating Gumbel Parameters and quantiles journal October 1979
The Community Earth System Model: A Framework for Collaborative Research journal September 2013
On the block maxima method in extreme value theory: PWM estimators journal February 2015
Exascale Storage Systems - An Analytical Study of Expenses journal March 2014
Extreme Value Theory book January 2006
Data Compression for Climate Data journal June 2016
4 Radiation budget of the climate system (Part 2/5) book January 2005
Evaluating Modes of Variability in Climate Models journal December 2014
Finding the Goldilocks zone: Compression-error trade-off for large gridded datasets journal July 2016
Statistics of extremes in hydrology journal August 2002
A new ensemble-based consistency test for the Community Earth System Model (pyCECT v1.0) journal January 2015
Improving floating point compression through binary masks conference October 2013
Fixed-Rate Compressed Floating-Point Arrays journal December 2014
Integrating Online Compression to Accelerate Large-Scale Data Analytics Applications
  • Bicer, Tekin; Yin, Jian; Chiu, David
  • 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2013.81
conference May 2013
A methodology for evaluating the impact of data compression on climate simulation data
  • Baker, Allison H.; Xu, Haiying; Dennis, John M.
  • Proceedings of the 23rd international symposium on High-performance parallel and distributed computing - HPDC '14 https://doi.org/10.1145/2600212.2600217
conference January 2014
A Gaussian graphical model approach to climate networks journal June 2014
Semi-parametric tail inference through probability-weighted moments journal February 2011
A fast nonparametric spatio-temporal regression scheme for generalized Pareto distributed heavy precipitation journal May 2014
Introduction book January 2012
Limiting forms of the frequency distribution of the largest or smallest member of a sample journal April 1928
Data Driven Methods for Nonlinear Granger Causality: Climate Teleconnection Mechanisms text January 2005

Cited By (14)

Axially symmetric models for global data: A journey between geostatistics and stochastic generators: Axially symmetric models journal January 2019
Lossy Data Compression Effects on Wall-bounded Turbulence: Bounds on Data Reduction journal May 2018
A Multivariate Global Spatiotemporal Stochastic Generator for Climate Ensembles journal February 2019
Evaluating image quality measures to assess the impact of lossy data compression applied to climate simulation data journal June 2019
Use cases of lossy compression for floating-point data in scientific data sets journal May 2019
Compression Challenges in Large Scale Partial Differential Equation Solvers journal September 2019
Requirements for a global data infrastructure in support of CMIP6 journal January 2018
Is Smaller Always Better? - Evaluating Video Compression Techniques for Simulation Ensembles text January 2021
Z-checker: A framework for assessing lossy compression of scientific data journal November 2017
Reducing storage of global wind ensembles with stochastic generators journal March 2018
Compression challenges in large scale PDE solvers text January 2019
A data model of the Climate and Forecast metadata conventions (CF-1.6) with a software implementation (cf-python v2.1) journal January 2017
Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF5 files journal September 2019
Lossy compression of Earth system model data based on a hierarchical tensor with Adaptive-HGFDR (v1.0) journal February 2021

Similar Records

Understanding and Modeling Lossy Compression Schemes on HPC Scientific Data
Conference · Tue May 01 00:00:00 EDT 2018 · OSTI ID:1389988

ISABELA for effective in situ compression of scientific data: ISABELA FOR EFFECTIVE IN-SITU REDUCTION OF SPATIO-TEMPORAL DATA
Journal Article · Wed Jul 11 00:00:00 EDT 2012 · Concurrency and Computation. Practice and Experience · OSTI ID:1389988

Optimizing Error-Bounded Lossy Compression for Scientific Data With Diverse Constraints
Journal Article · Thu Jul 28 00:00:00 EDT 2022 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1389988