Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Parallel membership queries on very large scientific data sets using bitmap indexes

Journal Article · · Concurrency and Computation. Practice and Experience
DOI:https://doi.org/10.1002/cpe.5157· OSTI ID:1503658
© 2019 John Wiley & Sons, Ltd. Many scientific applications produce very large amounts of data as advances in hardware fuel computing and experimental facilities. Managing and analyzing massive quantities of scientific data is challenging as data are often stored in specific formatted files, such as HDF5 and NetCDF, which do not offer appropriate search capabilities. In this research, we investigated a special class of search capability, called membership query, to identify whether queried elements of a set are members of an attribute. Attributes that naturally have classification values appear frequently in scientific domains such as category and object type as well as in daily life such as zip code and occupation. Because classification attribute values are discrete and require random data access, performing a membership query on a large scientific data set creates challenges. We applied bitmap indexing and parallelization to membership queries to overcome these challenges. Bitmap indexing provides high performance not only for low cardinality attributes but also for high cardinality attributes, such as floating-point variables, electric charge, or momentum in a particle physics data set, due to compression algorithms such as Word-Aligned Hybrid. We conducted experiments, in a highly parallelized environment, on data obtained from a particle accelerator model and a synthetic data set.
Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
DOE Contract Number:
AC02-05CH11231
OSTI ID:
1503658
Journal Information:
Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience Journal Issue: 15 Vol. 31; ISSN 1532-0626
Country of Publication:
United States
Language:
English

References (33)

Database technology for decision support systems journal January 2001
Bloofi: Multidimensional Bloom filters journal December 2015
An efficient bitmap encoding scheme for selection queries conference January 1999
SQL server column store indexes conference January 2011
Toward a modular and efficient distribution for Web service handlers: TOWARD A MODULAR AND EFFICIENT DISTRIBUTION FOR WEB SERVICE HANDLERS journal June 2012
Parallel I/O, analysis, and visualization of a trillion particle simulation
  • Byna, Surendra; Chou, Jerry; Rubel, Oliver
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.92
conference November 2012
Analyses of multi-level and multi-component compressed bitmap indexes journal February 2010
Multi-resolution bitmap indexes for scientific data journal August 2007
Optimizing bitmap indices with efficient compression journal March 2006
A survey of bitmap index compression algorithms for Big Data journal February 2015
Position list word aligned hybrid: optimizing space and performance for compressed bitmaps conference January 2010
A memory efficient reachability data structure through bit vector compression conference January 2011
Space efficient bitmap indexing conference January 2000
An Orchestration for Distributed Web Service Handlers
  • Yildiz, Beytullah; Fox, Geoffrey; Pallickara, Shrideep
  • 2008 3rd International Conference on internet and Web Applications and Services (ICIW), 2008 Third International Conference on Internet and Web Applications and Services https://doi.org/10.1109/ICIW.2008.55
conference June 2008
Multi-level bitmap indexes for flash memory storage conference January 2010
Strategies for processing ad hoc queries on large data warehouses conference January 2002
Queries and concept learning journal April 1988
Parallel index and query for large scale data analysis
  • Chou, Jerry; Ryne, Rob D.; Howison, Mark
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063424
conference January 2011
Optimizing candidate check costs for bitmap indices conference January 2005
Sorting improves word-aligned bitmap indexes journal January 2010
Locality-Sensitive Bloom Filter for Approximate Membership Query journal June 2012
The Square Kilometre Array journal August 2009
Bitmap index design and evaluation conference January 1998
On the Performance of Bitmap Indices for High Cardinality Attributes book January 2004
An Efficient Multi-Component Indexing Embedded Bitmap Compression for Data Reorganization journal January 2008
High resolution simulation of beam dynamics in electron linacs for x-ray free electron lasers journal October 2009
Parallel in situ indexing for data-intensive computing conference October 2011
Improved Bitmap Indexing Strategy for Data Warehouses conference December 2006
Indexing and Parallel Query Processing Support for Visualizing Climate Datasets conference September 2012
Scatter Bitmap: Space-Time Efficient Bitmap Indexing for Equality and Membership Queries conference November 2006
FastBit: interactively searching massive data journal July 2009
Optimizing fastquery performance on lustre file system conference January 2013
An overview of data warehousing and OLAP technology journal March 1997

Similar Records

Evaluation Strategies for Bitmap Indices with Binning
Conference · Thu Jun 03 00:00:00 EDT 2004 · OSTI ID:861196

HDF5-FastQuery: Accelerating Complex Queries on HDF Datasets usingFast Bitmap Indices
Conference · Wed Mar 29 23:00:00 EST 2006 · OSTI ID:881620

Efficient binning for bitmap indices on high-cardinality attributes
Technical Report · Tue Nov 16 23:00:00 EST 2004 · OSTI ID:841113

Related Subjects