skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: ISABELA for effective in situ compression of scientific data: ISABELA FOR EFFECTIVE IN-SITU REDUCTION OF SPATIO-TEMPORAL DATA

Journal Article · · Concurrency and Computation. Practice and Experience
DOI:https://doi.org/10.1002/cpe.2887· OSTI ID:1564924
;  [1];  [2];  [2];  [2];  [3];  [4];  [4];
  1. North Carolina State University, Raleigh, NC, 27695, USA
  2. Princeton Plasma Physics Laboratory, Princeton, NJ, 08543, USA
  3. Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA
  4. Argonne National Laboratory, Argonne, IL, 60439, USA

Exploding dataset sizes from extreme-scale scientific simulations necessitates efficient data management and reduction schemes to mitigate I/O costs. With the discrepancy between I/O bandwidth and computational power, scientists are forced to capture data infrequently, thereby making data collection an inherently lossy process. Although data compression can be an effective solution, the random nature of real-valued scientific datasets renders lossless compression routines ineffective. These techniques also impose significant overhead during decompression, making them unsuitable for data analysis and visualization, which require repeated data access. To address this problem, we propose an effective method for In situ Sort-And-B-spline Error-bounded Lossy Abatement (ISABELA) of scientific data that is widely regarded as effectively incompressible. With ISABELA, we apply a pre-conditioner to seemingly random and noisy data along spatial resolution to achieve an accurate fitting model that guarantees a ≥0.99 correlation with the original data. We further take advantage of temporal patterns in scientific data to compress data b ≈ 85%, while introducing only a negligible overhead on simulations in terms of runtime. ISABELA significantly outperforms existing lossy compression methods, such as wavelet compression, in terms of data reduction and accuracy. We extend upon our previous paper by additionally building a communication-free, scalable parallel storage framework on top of ISABELA-compressed data that is ideally suited for extreme-scale analytical processing. The basis for our storage framework is an inherently local decompression method (it need not decode the entire data), which allows for random access decompression and low-overhead task division that can be exploited over heterogeneous architectures. Furthermore, analytical operations such as correlation and query processing run quickly and accurately over data in the compressed space. Copyright © 2012 John Wiley & Sons, Ltd.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); UT-Battelle LLC/ORNL, Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1564924
Journal Information:
Concurrency and Computation. Practice and Experience, Vol. 25, Issue 4; ISSN 1532-0626
Publisher:
Wiley
Country of Publication:
United States
Language:
English

References (15)

Data reduction using cubic rational B-splines journal May 1992
PreDatA – preparatory data analytics on peta-scale machines conference April 2010
Extending I/O through high performance data services conference August 2009
Gyro-kinetic simulation of global turbulent transport properties in tokamak experiments journal September 2006
In-situ processing and visualization for ultrascale simulations journal July 2007
ISABELA-QA: query-driven analytics with ISABELA-compressed extreme-scale scientific data
  • Lakshminarasimhan, Sriram; Klasky, Scott; Latham, Robert
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063425
conference January 2011
Fast and Efficient Compression of Floating-Point Data journal September 2006
FPC: A High-Speed Compressor for Double-Precision Floating-Point Data journal January 2009
Lossless compression of predicted floating-point geometry journal July 2005
Scattered data interpolation with multilevel B-splines journal January 1997
ISOBAR Preconditioner for Effective and High-throughput Lossless Data Compression
  • Schendel, Eric R.; Jin, Ye; Shah, Neil
  • 2012 IEEE International Conference on Data Engineering (ICDE 2012), 2012 IEEE 28th International Conference on Data Engineering https://doi.org/10.1109/ICDE.2012.114
conference April 2012
A Technique for High-Performance Data Compression journal June 1984
Full-f gyrokinetic particle simulation of centrally heated global ITG turbulence from magnetic axis to edge pedestal top in a realistic tokamak geometry journal September 2009
ISOBAR hybrid compression-I/O interleaving for large-scale parallel I/O optimization
  • Schendel, Eric R.; Klasky, Scott; Ross, Robert
  • Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing - HPDC '12 https://doi.org/10.1145/2287076.2287086
conference January 2012
Adaptable, metadata rich IO methods for portable high performance IO conference May 2009

Similar Records

Understanding and Modeling Lossy Compression Schemes on HPC Scientific Data
Conference · Tue May 01 00:00:00 EDT 2018 · OSTI ID:1564924

Performance Optimization for Relative-Error-Bounded Lossy Compression on Scientific Data
Journal Article · Mon Feb 10 00:00:00 EST 2020 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1564924

Accelerating Relative-error Bounded Lossy Compression for HPC datasets with Precomputation-Based Mechanisms
Conference · Mon May 20 00:00:00 EDT 2019 · OSTI ID:1564924

Related Subjects