Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

DART: distributed adaptive radix tree for efficient affix-based keyword search on HPC systems

Conference ·
 [1];  [2];  [2];  [1]
  1. Texas Tech University
  2. Lawrence Berkeley National Laboratory

© 2018 Association for Computing Machinery. Affix-based search is a fundamental functionality for storage systems. It allows users to find desired datasets, where attributes of a dataset match an affix. While building inverted index to facilitate efficient affix based keyword search is a common practice for standalone databases and for desktop file systems, building local indexes or adopting indexing techniques used in a standalone data store is insufficient for highperformance computing (HPC) systems due to the massive amount of data and distributed nature of the storage devices within a system. In this paper, we propose Distributed Adaptive Radix Tree (DART), to address the challenge of distributed affix-based keyword search on HPC systems. This trie-based approach is scalable in achieving efficient affix-based search and alleviating imbalanced keyword distribution and excessive requests on keywords at scale. Our evaluation at different scales shows that, comparing with the "full string hashing" use case of the most popular distributed indexing technique - Distributed Hash Table (DHT), DART achieves up to 55× better throughput with prefix search and with suffix search, while achieving comparable throughput with exact and infix searches. Also, comparing to the "initial hashing" use case of DHT, DART maintains a balanced keyword distribution on distributed nodes and alleviates excessive query workload against popular keywords.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
DOE Contract Number:
AC02-05CH11231
OSTI ID:
1602812
Resource Relation:
Conference: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques - PACT '18, November 1–4, 2018, Limassol, Cyprus
Country of Publication:
United States
Language:
English

References (24)

Cost effective speculation with the omnipredictor
  • No authors listed
  • PACT '18: International conference on Parallel Architectures and Compilation Techniques, Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques https://doi.org/10.1145/3243176.3243208
November 2018
Making Peer-to-Peer Keyword Searching Feasible Using Multi-level Partitioning January 2005
Ubiquitous B-Tree June 1979
Toward Scalable and Asynchronous Object-Centric Data Management for HPC May 2018
The power of two choices in randomized load balancing January 2001
Wikipedia workload analysis for decentralized hosting July 2009
DeltaFS: exascale file systems scale better without dedicated servers January 2015
The adaptive radix tree: ARTful indexing for main-memory databases April 2013
Implicit sampling combined with reduced order modeling for the inversion of vadose zone hydrological data November 2017
Brief announcement: prefix hash tree January 2004
Atributed consistent hashing for heterogeneous storage systems
  • No authors listed
  • PACT '18: International conference on Parallel Architectures and Compilation Techniques, Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques https://doi.org/10.1145/3243176.3243202
November 2018
IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion November 2014
Peer-to-peer systems for prefix search January 2003
Search for Sources of High-Energy Neutrons with four Years of data from the Icetop Detector October 2016
Accurate force field for molybdenum by machine learning large materials data September 2017
Complex Queries in DHT-based Peer-to-Peer Networks January 2002
Fast Pattern Matching in Strings June 1977
Searching for millions of objects in the BOSS spectroscopic survey data with H5Boss August 2017
Mining Materials Design Rules from Data: The Example of Polymer Dielectrics October 2017
IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses October 2016
Mercury: Enabling remote procedure call for high-performance computing September 2013
SoMeta: Scalable Object-Centric Metadata Management for High Performance Computing September 2017
Toward Efficient and Flexible Metadata Indexing of Big Data Systems March 2017
FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems October 2014