skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: ALACRITY: Analytics-Driven Lossless Data Compression for Rapid In-Situ Indexing, Storing, and Querying. In: Transactions on Large-Scale Data- and Knowledge-Centered Systems X. Lecture Notes in Computer Science, vol 8220

Book ·
 [1];  [1];  [1];  [1];  [1];  [1];  [2];  [2];  [3];  [3];  [4];  [5];  [1]
  1. North Carolina State Univ., Raleigh, NC (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  2. Princeton Plasma Physics Lab. (PPPL), Princeton, NJ (United States)
  3. Sandia National Lab. (SNL-CA), Livermore, CA (United States)
  4. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  5. Argonne National Lab. (ANL), Argonne, IL (United States)

High-performance computing architectures face nontrivial data processing challenges, as computational and I/O components further diverge in performance trajectories. For scientific data analysis in particular, methods based on generating heavyweight access acceleration structures, e.g. indexes, are becoming less feasible for ever-increasing dataset sizes. We present ALACRITY, demonstrating the effectiveness of a fused data and index encoding of scientific, floating-point data in generating lightweight data structures amenable to common types of queries used in scientific data analysis. We exploit the representation of floating-point values by extracting significant bytes, using the resulting unique values to bin the remaining data along fixed-precision boundaries. To optimize query processing, we use an inverted index, mapping each generated bin to a list of records contained within, allowing us to optimize query processing with attribute range constraints. Overall, the storage footprint for both index and data is shown to be below numerous configurations of bitmap indexing, while matching or outperforming query performance.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Organization:
USDOE Office of Science (SC)
OSTI ID:
1567532
Country of Publication:
United States
Language:
English

References (20)

Integrating compression and execution in column-oriented database systems conference January 2006
FLASH: An Adaptive Mesh Hydrodynamics Code for Modeling Astrophysical Thermonuclear Flashes journal November 2000
High Throughput Compression of Double-Precision Floating-Point Data conference March 2007
FPC: A High-Speed Compressor for Double-Precision Floating-Point Data journal January 2009
Terascale direct numerical simulations of turbulent combustion using S3D journal January 2009
Ubiquitous B-Tree journal June 1979
Out-of-core compression and decompression of large n-dimensional scalar fields journal September 2003
Lossless compression of predicted floating-point geometry journal July 2005
Analytics-Driven Lossless Data Compression for Rapid In-situ Indexing, Storing, and Querying book January 2012
Full-f gyrokinetic particle simulation of centrally heated global ITG turbulence from magnetic axis to edge pedestal top in a realistic tokamak geometry journal September 2009
Fast and Efficient Compression of Floating-Point Data journal September 2006
ISOBAR Preconditioner for Effective and High-throughput Lossless Data Compression
  • Schendel, Eric R.; Jin, Ye; Shah, Neil
  • 2012 IEEE International Conference on Data Engineering (ICDE 2012), 2012 IEEE 28th International Conference on Data Engineering https://doi.org/10.1109/ICDE.2012.114
conference April 2012
Multi-resolution bitmap indexes for scientific data journal August 2007
Gyro-kinetic simulation of global turbulent transport properties in tokamak experiments journal September 2006
The implementation and performance of compressed databases journal September 2000
FastBit: an efficient indexing technology for accelerating data-intensive science journal January 2005
On the Performance of Bitmap Indices for High Cardinality Attributes book January 2004
Optimizing bitmap indices with efficient compression journal March 2006
Inverted index compression and query processing with optimized document ordering conference January 2009
Inverted files for text search engines journal July 2006