Parallel membership queries on very large scientific data sets using bitmap indexes
Journal Article
·
· Concurrency and Computation. Practice and Experience
© 2019 John Wiley & Sons, Ltd. Many scientific applications produce very large amounts of data as advances in hardware fuel computing and experimental facilities. Managing and analyzing massive quantities of scientific data is challenging as data are often stored in specific formatted files, such as HDF5 and NetCDF, which do not offer appropriate search capabilities. In this research, we investigated a special class of search capability, called membership query, to identify whether queried elements of a set are members of an attribute. Attributes that naturally have classification values appear frequently in scientific domains such as category and object type as well as in daily life such as zip code and occupation. Because classification attribute values are discrete and require random data access, performing a membership query on a large scientific data set creates challenges. We applied bitmap indexing and parallelization to membership queries to overcome these challenges. Bitmap indexing provides high performance not only for low cardinality attributes but also for high cardinality attributes, such as floating-point variables, electric charge, or momentum in a particle physics data set, due to compression algorithms such as Word-Aligned Hybrid. We conducted experiments, in a highly parallelized environment, on data obtained from a particle accelerator model and a synthetic data set.
- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
- DOE Contract Number:
- AC02-05CH11231
- OSTI ID:
- 1503658
- Journal Information:
- Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience Journal Issue: 15 Vol. 31; ISSN 1532-0626
- Country of Publication:
- United States
- Language:
- English
Database technology for decision support systems
|
journal | January 2001 |
Bloofi: Multidimensional Bloom filters
|
journal | December 2015 |
An efficient bitmap encoding scheme for selection queries
|
conference | January 1999 |
SQL server column store indexes
|
conference | January 2011 |
Toward a modular and efficient distribution for Web service handlers: TOWARD A MODULAR AND EFFICIENT DISTRIBUTION FOR WEB SERVICE HANDLERS
|
journal | June 2012 |
Parallel I/O, analysis, and visualization of a trillion particle simulation
|
conference | November 2012 |
Analyses of multi-level and multi-component compressed bitmap indexes
|
journal | February 2010 |
Multi-resolution bitmap indexes for scientific data
|
journal | August 2007 |
Optimizing bitmap indices with efficient compression
|
journal | March 2006 |
A survey of bitmap index compression algorithms for Big Data
|
journal | February 2015 |
Position list word aligned hybrid: optimizing space and performance for compressed bitmaps
|
conference | January 2010 |
A memory efficient reachability data structure through bit vector compression
|
conference | January 2011 |
Space efficient bitmap indexing
|
conference | January 2000 |
An Orchestration for Distributed Web Service Handlers
|
conference | June 2008 |
Multi-level bitmap indexes for flash memory storage
|
conference | January 2010 |
Strategies for processing ad hoc queries on large data warehouses
|
conference | January 2002 |
Queries and concept learning
|
journal | April 1988 |
Parallel index and query for large scale data analysis
|
conference | January 2011 |
Optimizing candidate check costs for bitmap indices
|
conference | January 2005 |
Sorting improves word-aligned bitmap indexes
|
journal | January 2010 |
Locality-Sensitive Bloom Filter for Approximate Membership Query
|
journal | June 2012 |
The Square Kilometre Array
|
journal | August 2009 |
Bitmap index design and evaluation
|
conference | January 1998 |
On the Performance of Bitmap Indices for High Cardinality Attributes
|
book | January 2004 |
An Efficient Multi-Component Indexing Embedded Bitmap Compression for Data Reorganization
|
journal | January 2008 |
High resolution simulation of beam dynamics in electron linacs for x-ray free electron lasers
|
journal | October 2009 |
Parallel in situ indexing for data-intensive computing
|
conference | October 2011 |
Improved Bitmap Indexing Strategy for Data Warehouses
|
conference | December 2006 |
Indexing and Parallel Query Processing Support for Visualizing Climate Datasets
|
conference | September 2012 |
Scatter Bitmap: Space-Time Efficient Bitmap Indexing for Equality and Membership Queries
|
conference | November 2006 |
FastBit: interactively searching massive data
|
journal | July 2009 |
Optimizing fastquery performance on lustre file system
|
conference | January 2013 |
An overview of data warehousing and OLAP technology
|
journal | March 1997 |
Similar Records
Evaluation Strategies for Bitmap Indices with Binning
HDF5-FastQuery: Accelerating Complex Queries on HDF Datasets usingFast Bitmap Indices
Efficient binning for bitmap indices on high-cardinality attributes
Conference
·
Thu Jun 03 00:00:00 EDT 2004
·
OSTI ID:861196
HDF5-FastQuery: Accelerating Complex Queries on HDF Datasets usingFast Bitmap Indices
Conference
·
Wed Mar 29 23:00:00 EST 2006
·
OSTI ID:881620
Efficient binning for bitmap indices on high-cardinality attributes
Technical Report
·
Tue Nov 16 23:00:00 EST 2004
·
OSTI ID:841113