DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Evaluating lossy data compression on climate simulation data within a large ensemble

Abstract

High-resolution Earth system model simulations generate enormous data volumes, and retaining the data from these simulations often strains institutional storage resources. Further, these exceedingly large storage requirements negatively impact science objectives, for example, by forcing reductions in data output frequency, simulation length, or ensemble size. To lessen data volumes from the Community Earth System Model (CESM), we advocate the use of lossy data compression techniques. While lossy data compression does not exactly preserve the original data (as lossless compression does), lossy techniques have an advantage in terms of smaller storage requirements. To preserve the integrity of the scientific simulation data, the effects of lossy data compression on the original data should, at a minimum, not be statistically distinguishable from the natural variability of the climate system, and previous preliminary work with data from CESM has shown this goal to be attainable. However, to ultimately convince climate scientists that it is acceptable to use lossy data compression, we provide climate scientists with access to publicly available climate data that have undergone lossy data compression. In particular, we report on the results of a lossy data compression experiment with output from the CESM Large Ensemble (CESM-LE) Community Project, in which we challengemore » climate scientists to examine features of the data relevant to their interests, and attempt to identify which of the ensemble members have been compressed and reconstructed. We find that while detecting distinguishing features is certainly possible, the compression effects noticeable in these features are often unimportant or disappear in post-processing analyses. In addition, we perform several analyses that directly compare the original data to the reconstructed data to investigate the preservation, or lack thereof, of specific features critical to climate science. Overall, we conclude that applying lossy data compression to climate simulation data is both advantageous in terms of data reduction and generally acceptable in terms of effects on scientific results.« less

Authors:
 [1]; ORCiD logo [1];  [1];  [1];  [2];  [3];  [1]; ORCiD logo [4];  [4];  [5];  [5]; ORCiD logo [5];  [1];  [6]; ORCiD logo [7]
  1. National Center for Atmospheric Research, Boulder, CO (United States)
  2. ETH Zurich (Switzerland). Institute for Atmospheric and Climate Science
  3. Laboratoire des Sciences du Climat et l Environnement, Gif-sur-Yvette (France)
  4. Colorado State Univ., Fort Collins, CO (United States). Department of Electrical and Computer Engineering
  5. CNR-Institute of Atmospheric Pollution Research, Rende (Italy). Division of Rende, UNICAL-Polifunzional
  6. Univ. of Colorado, Boulder, CO (United States). Department of Oceanic and Atmospheric Sciences
  7. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States). Center for Applied Scientific Computing
Publication Date:
Research Org.:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1389988
Report Number(s):
LLNL-JRNL-691060
Journal ID: ISSN 1991-9603
Grant/Contract Number:  
AC52-07NA27344
Resource Type:
Accepted Manuscript
Journal Name:
Geoscientific Model Development (Online)
Additional Journal Information:
Journal Name: Geoscientific Model Development (Online); Journal Volume: 9; Journal Issue: 12; Journal ID: ISSN 1991-9603
Publisher:
European Geosciences Union
Country of Publication:
United States
Language:
English
Subject:
58 GEOSCIENCES; 97 MATHEMATICS AND COMPUTING; 54 ENVIRONMENTAL SCIENCES

Citation Formats

Baker, Allison H., Hammerling, Dorit M., Mickelson, Sheri A., Xu, Haiying, Stolpe, Martin B., Naveau, Phillipe, Sanderson, Ben, Ebert-Uphoff, Imme, Samarasinghe, Savini, De Simone, Francesco, Carbone, Francesco, Gencarelli, Christian N., Dennis, John M., Kay, Jennifer E., and Lindstrom, Peter. Evaluating lossy data compression on climate simulation data within a large ensemble. United States: N. p., 2016. Web. doi:10.5194/gmd-9-4381-2016.
Baker, Allison H., Hammerling, Dorit M., Mickelson, Sheri A., Xu, Haiying, Stolpe, Martin B., Naveau, Phillipe, Sanderson, Ben, Ebert-Uphoff, Imme, Samarasinghe, Savini, De Simone, Francesco, Carbone, Francesco, Gencarelli, Christian N., Dennis, John M., Kay, Jennifer E., & Lindstrom, Peter. Evaluating lossy data compression on climate simulation data within a large ensemble. United States. https://doi.org/10.5194/gmd-9-4381-2016
Baker, Allison H., Hammerling, Dorit M., Mickelson, Sheri A., Xu, Haiying, Stolpe, Martin B., Naveau, Phillipe, Sanderson, Ben, Ebert-Uphoff, Imme, Samarasinghe, Savini, De Simone, Francesco, Carbone, Francesco, Gencarelli, Christian N., Dennis, John M., Kay, Jennifer E., and Lindstrom, Peter. Wed . "Evaluating lossy data compression on climate simulation data within a large ensemble". United States. https://doi.org/10.5194/gmd-9-4381-2016. https://www.osti.gov/servlets/purl/1389988.
@article{osti_1389988,
title = {Evaluating lossy data compression on climate simulation data within a large ensemble},
author = {Baker, Allison H. and Hammerling, Dorit M. and Mickelson, Sheri A. and Xu, Haiying and Stolpe, Martin B. and Naveau, Phillipe and Sanderson, Ben and Ebert-Uphoff, Imme and Samarasinghe, Savini and De Simone, Francesco and Carbone, Francesco and Gencarelli, Christian N. and Dennis, John M. and Kay, Jennifer E. and Lindstrom, Peter},
abstractNote = {High-resolution Earth system model simulations generate enormous data volumes, and retaining the data from these simulations often strains institutional storage resources. Further, these exceedingly large storage requirements negatively impact science objectives, for example, by forcing reductions in data output frequency, simulation length, or ensemble size. To lessen data volumes from the Community Earth System Model (CESM), we advocate the use of lossy data compression techniques. While lossy data compression does not exactly preserve the original data (as lossless compression does), lossy techniques have an advantage in terms of smaller storage requirements. To preserve the integrity of the scientific simulation data, the effects of lossy data compression on the original data should, at a minimum, not be statistically distinguishable from the natural variability of the climate system, and previous preliminary work with data from CESM has shown this goal to be attainable. However, to ultimately convince climate scientists that it is acceptable to use lossy data compression, we provide climate scientists with access to publicly available climate data that have undergone lossy data compression. In particular, we report on the results of a lossy data compression experiment with output from the CESM Large Ensemble (CESM-LE) Community Project, in which we challenge climate scientists to examine features of the data relevant to their interests, and attempt to identify which of the ensemble members have been compressed and reconstructed. We find that while detecting distinguishing features is certainly possible, the compression effects noticeable in these features are often unimportant or disappear in post-processing analyses. In addition, we perform several analyses that directly compare the original data to the reconstructed data to investigate the preservation, or lack thereof, of specific features critical to climate science. Overall, we conclude that applying lossy data compression to climate simulation data is both advantageous in terms of data reduction and generally acceptable in terms of effects on scientific results.},
doi = {10.5194/gmd-9-4381-2016},
journal = {Geoscientific Model Development (Online)},
number = 12,
volume = 9,
place = {United States},
year = {Wed Dec 07 00:00:00 EST 2016},
month = {Wed Dec 07 00:00:00 EST 2016}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 28 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Revisiting wavelet compression for large-scale climate data using JPEG 2000 and ensuring data precision
conference, October 2011

  • Woodring, Jonathan; Mniszewski, Susan; Brislawn, Christopher
  • 2011 IEEE Symposium on Large Data Analysis and Visualization (LDAV)
  • DOI: 10.1109/LDAV.2011.6092314

Causal Discovery for Climate Research Using Graphical Models
journal, September 2012


Climate Model Intercomparisons: Preparing for the Next Phase
journal, March 2014

  • Meehl, Gerald A.; Moss, Richard; Taylor, Karl E.
  • Eos, Transactions American Geophysical Union, Vol. 95, Issue 9
  • DOI: 10.1002/2014EO090001

Global and regional evolution of short-lived radiatively-active gases and aerosols in the Representative Concentration Pathways
journal, August 2011

  • Lamarque, Jean-François; Kyle, G. Page; Meinshausen, Malte
  • Climatic Change, Vol. 109, Issue 1-2
  • DOI: 10.1007/s10584-011-0155-0

Light-weight parallel Python tools for earth system modeling workflows
conference, October 2015

  • Paul, Kevin; Mickelson, Sheri; Dennis, John M.
  • 2015 IEEE International Conference on Big Data (Big Data)
  • DOI: 10.1109/BigData.2015.7363979

Spatio-temporal dynamics, patterns formation and turbulence in complex fluids due to electrohydrodynamics instabilities
journal, August 2011


Particle filtering for Gumbel-distributed daily maxima of methane and nitrous oxide: PARTICLE FILTERING FOR GUMBEL MAXIMA
journal, December 2012

  • Toulemonde, Gwladys; Guillou, Armelle; Naveau, Philippe
  • Environmetrics, Vol. 24, Issue 1
  • DOI: 10.1002/env.2192

Assessing the effects of data compression in simulations using physically motivated metrics
conference, January 2013

  • Laney, Daniel; Langer, Steven; Weber, Christopher
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13
  • DOI: 10.1145/2503210.2503283

Modelling Extremal Events for Insurance and Finance
book, January 1997


Parameter and Quantile Estimation for the Generalized Pareto Distribution
journal, August 1987


A High Performance Compression Method for Climate Data
conference, August 2014

  • Liu, Songbin; Huang, Xiaomeng; Ni, Yufang
  • 2014 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)
  • DOI: 10.1109/ISPA.2014.18

Fast and Efficient Compression of Floating-Point Data
journal, September 2006

  • Lindstrom, Peter; Isenburg, Martin
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 12, Issue 5
  • DOI: 10.1109/TVCG.2006.143

The Community Earth System Model (CESM) Large Ensemble Project: A Community Resource for Studying Climate Change in the Presence of Internal Climate Variability
journal, August 2015

  • Kay, J. E.; Deser, C.; Phillips, A.
  • Bulletin of the American Meteorological Society, Vol. 96, Issue 8
  • DOI: 10.1175/BAMS-D-13-00255.1

Czip: A Fast Lossless Compression Algorithm for Climate Data
journal, March 2016

  • Huang, Xiaomeng; Ni, Yufang; Chen, Dexun
  • International Journal of Parallel Programming, Vol. 44, Issue 6
  • DOI: 10.1007/s10766-016-0403-z

Causation, Prediction, and Search (2nd edition)
book, January 2001


Bayesian Spatial Modeling of Extreme Precipitation Return Levels
journal, September 2007

  • Cooley, Daniel; Nychka, Douglas; Naveau, Philippe
  • Journal of the American Statistical Association, Vol. 102, Issue 479
  • DOI: 10.1198/016214506000000780

Probability weighted moments compared with some traditional techniques in estimating Gumbel Parameters and quantiles
journal, October 1979

  • Landwehr, J. Maciunas; Matalas, N. C.; Wallis, J. R.
  • Water Resources Research, Vol. 15, Issue 5
  • DOI: 10.1029/WR015i005p01055

The Community Earth System Model: A Framework for Collaborative Research
journal, September 2013

  • Hurrell, James W.; Holland, M. M.; Gent, P. R.
  • Bulletin of the American Meteorological Society, Vol. 94, Issue 9
  • DOI: 10.1175/BAMS-D-12-00121.1

On the block maxima method in extreme value theory: PWM estimators
journal, February 2015

  • Ferreira, Ana; de Haan, Laurens
  • The Annals of Statistics, Vol. 43, Issue 1
  • DOI: 10.1214/14-AOS1280

Exascale Storage Systems - An Analytical Study of Expenses
journal, March 2014

  • Kunkel, Julian Martin; Kuhn, Michael; Ludwig, Thomas
  • Supercomputing Frontiers and Innovations, Vol. 1, Issue 1
  • DOI: 10.14529/jsfi140106

Extreme Value Theory
book, January 2006

  • de Haan, Laurens; Ferreira, Ana
  • Springer Series in Operations Research and Financial Engineering
  • DOI: 10.1007/0-387-34471-3

Data Compression for Climate Data
journal, June 2016

  • Kuhn, Michael; Kunkel, Julian; Ludwig, Thomas
  • Supercomputing Frontiers and Innovations, Vol. 3, Issue 1
  • DOI: 10.14529/jsfi160105

4 Radiation budget of the climate system (Part 2/5)
book, January 2005


Evaluating Modes of Variability in Climate Models
journal, December 2014

  • Phillips, Adam S.; Deser, Clara; Fasullo, John
  • Eos, Transactions American Geophysical Union, Vol. 95, Issue 49
  • DOI: 10.1002/2014EO490002

Finding the Goldilocks zone: Compression-error trade-off for large gridded datasets
journal, July 2016

  • Silver, Jeremy D.; Zender, Charles S.
  • Geoscientific Model Development Discussions
  • DOI: 10.5194/gmd-2016-177

Statistics of extremes in hydrology
journal, August 2002


A new ensemble-based consistency test for the Community Earth System Model (pyCECT v1.0)
journal, January 2015

  • Baker, A. H.; Hammerling, D. M.; Levy, M. N.
  • Geoscientific Model Development, Vol. 8, Issue 9
  • DOI: 10.5194/gmd-8-2829-2015

Improving floating point compression through binary masks
conference, October 2013


Fixed-Rate Compressed Floating-Point Arrays
journal, December 2014

  • Lindstrom, Peter
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 20, Issue 12
  • DOI: 10.1109/TVCG.2014.2346458

Integrating Online Compression to Accelerate Large-Scale Data Analytics Applications
conference, May 2013

  • Bicer, Tekin; Yin, Jian; Chiu, David
  • 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
  • DOI: 10.1109/IPDPS.2013.81

A methodology for evaluating the impact of data compression on climate simulation data
conference, January 2014

  • Baker, Allison H.; Xu, Haiying; Dennis, John M.
  • Proceedings of the 23rd international symposium on High-performance parallel and distributed computing - HPDC '14
  • DOI: 10.1145/2600212.2600217

A Gaussian graphical model approach to climate networks
journal, June 2014

  • Zerenner, Tanja; Friederichs, Petra; Lehnertz, Klaus
  • Chaos: An Interdisciplinary Journal of Nonlinear Science, Vol. 24, Issue 2
  • DOI: 10.1063/1.4870402

Semi-parametric tail inference through probability-weighted moments
journal, February 2011

  • Caeiro, Frederico; Ivette Gomes, M.
  • Journal of Statistical Planning and Inference, Vol. 141, Issue 2
  • DOI: 10.1016/j.jspi.2010.08.015

A fast nonparametric spatio-temporal regression scheme for generalized Pareto distributed heavy precipitation
journal, May 2014

  • Naveau, P.; Toreti, A.; Smith, I.
  • Water Resources Research, Vol. 50, Issue 5
  • DOI: 10.1002/2014wr015431

Introduction
book, January 2012


Limiting forms of the frequency distribution of the largest or smallest member of a sample
journal, April 1928

  • Fisher, R. A.; Tippett, L. H. C.
  • Mathematical Proceedings of the Cambridge Philosophical Society, Vol. 24, Issue 2
  • DOI: 10.1017/s0305004100015681

Data Driven Methods for Nonlinear Granger Causality: Climate Teleconnection Mechanisms
text, January 2005

  • Chu, Tianjiao; Danks, David; Glymour, Clark
  • Carnegie Mellon University
  • DOI: 10.1184/r1/6491327

Bayesian Spatial Modeling of Extreme Precipitation Return Levels
journal, September 2007

  • Cooley, Daniel; Nychka, Douglas; Naveau, Philippe
  • Journal of the American Statistical Association, Vol. 102, Issue 479
  • DOI: 10.1198/016214506000000780

Works referencing / citing this record:

Axially symmetric models for global data: A journey between geostatistics and stochastic generators: Axially symmetric models
journal, January 2019

  • Porcu, E.; Castruccio, S.; Alegría, A.
  • Environmetrics, Vol. 30, Issue 1
  • DOI: 10.1002/env.2555

Lossy Data Compression Effects on Wall-bounded Turbulence: Bounds on Data Reduction
journal, May 2018

  • Otero, Evelyn; Vinuesa, Ricardo; Marin, Oana
  • Flow, Turbulence and Combustion, Vol. 101, Issue 2
  • DOI: 10.1007/s10494-018-9923-5

A Multivariate Global Spatiotemporal Stochastic Generator for Climate Ensembles
journal, February 2019

  • Edwards, Matthew; Castruccio, Stefano; Hammerling, Dorit
  • Journal of Agricultural, Biological and Environmental Statistics, Vol. 24, Issue 3
  • DOI: 10.1007/s13253-019-00352-8

Evaluating image quality measures to assess the impact of lossy data compression applied to climate simulation data
journal, June 2019

  • Baker, A. H.; Hammerling, D. M.; Turton, T. L.
  • Computer Graphics Forum, Vol. 38, Issue 3
  • DOI: 10.1111/cgf.13707

Use cases of lossy compression for floating-point data in scientific data sets
journal, May 2019

  • Cappello, Franck; Di, Sheng; Li, Sihuan
  • The International Journal of High Performance Computing Applications, Vol. 33, Issue 6
  • DOI: 10.1177/1094342019853336

Compression Challenges in Large Scale Partial Differential Equation Solvers
journal, September 2019

  • Götschel, Sebastian; Weiser, Martin
  • Algorithms, Vol. 12, Issue 9
  • DOI: 10.3390/a12090197

Requirements for a global data infrastructure in support of CMIP6
journal, January 2018

  • Balaji, Venkatramani; Taylor, Karl E.; Juckes, Martin
  • Geoscientific Model Development, Vol. 11, Issue 9
  • DOI: 10.5194/gmd-11-3659-2018

Is Smaller Always Better? - Evaluating Video Compression Techniques for Simulation Ensembles
text, January 2021


Z-checker: A framework for assessing lossy compression of scientific data
journal, November 2017

  • Tao, Dingwen; Di, Sheng; Guo, Hanqi
  • The International Journal of High Performance Computing Applications, Vol. 33, Issue 2
  • DOI: 10.1177/1094342017737147

Reducing storage of global wind ensembles with stochastic generators
journal, March 2018

  • Jeong, Jaehong; Castruccio, Stefano; Crippa, Paola
  • The Annals of Applied Statistics, Vol. 12, Issue 1
  • DOI: 10.1214/17-aoas1105

Compression challenges in large scale PDE solvers
text, January 2019


A data model of the Climate and Forecast metadata conventions (CF-1.6) with a software implementation (cf-python v2.1)
journal, January 2017

  • Hassell, David; Gregory, Jonathan; Blower, Jon
  • Geoscientific Model Development, Vol. 10, Issue 12
  • DOI: 10.5194/gmd-10-4619-2017

Requirements for a global data infrastructure in support of CMIP6
journal, January 2018

  • Balaji, Venkatramani; Taylor, Karl E.; Juckes, Martin
  • Geoscientific Model Development, Vol. 11, Issue 9
  • DOI: 10.5194/gmd-11-3659-2018

Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF5 files
journal, September 2019

  • Delaunay, Xavier; Courtois, Aurélie; Gouillon, Flavien
  • Geoscientific Model Development, Vol. 12, Issue 9
  • DOI: 10.5194/gmd-12-4099-2019

Lossy compression of Earth system model data based on a hierarchical tensor with Adaptive-HGFDR (v1.0)
journal, February 2021

  • Yu, Zhaoyuan; Li, Dongshuang; Zhang, Zhengfang
  • Geoscientific Model Development, Vol. 14, Issue 2
  • DOI: 10.5194/gmd-14-875-2021