ALACRITY: Analytics-Driven Lossless Data Compression for Rapid In-Situ Indexing, Storing, and Querying. In: Transactions on Large-Scale Data- and Knowledge-Centered Systems X. Lecture Notes in Computer Science, vol 8220
- North Carolina State Univ., Raleigh, NC (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Princeton Plasma Physics Lab. (PPPL), Princeton, NJ (United States)
- Sandia National Lab. (SNL-CA), Livermore, CA (United States)
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Argonne National Lab. (ANL), Argonne, IL (United States)
High-performance computing architectures face nontrivial data processing challenges, as computational and I/O components further diverge in performance trajectories. For scientific data analysis in particular, methods based on generating heavyweight access acceleration structures, e.g. indexes, are becoming less feasible for ever-increasing dataset sizes. We present ALACRITY, demonstrating the effectiveness of a fused data and index encoding of scientific, floating-point data in generating lightweight data structures amenable to common types of queries used in scientific data analysis. We exploit the representation of floating-point values by extracting significant bytes, using the resulting unique values to bin the remaining data along fixed-precision boundaries. To optimize query processing, we use an inverted index, mapping each generated bin to a list of records contained within, allowing us to optimize query processing with attribute range constraints. Overall, the storage footprint for both index and data is shown to be below numerous configurations of bitmap indexing, while matching or outperforming query performance.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
- Sponsoring Organization:
- USDOE Office of Science (SC)
- OSTI ID:
- 1567532
- Country of Publication:
- United States
- Language:
- English
Similar Records
Parallel membership queries on very large scientific data sets using bitmap indexes
Minimizing I/O Costs of Multi-Dimensional Queries with BitmapIndices