Tagit: an integrated indexing and search service for file systems
- Virginia Tech and Oak Ridge National Laboratory
- Sogang University
- Oak Ridge National Laboratory
- Virginia Tech
Data services such as search, discovery, and management in scalable distributed environments have traditionally been decoupled from the underlying file systems, and are often deployed using external databases and indexing services. However, modern data production rates, looming data movement costs, and the lack of metadata, entail revisiting the decoupled file system-data services design philosophy. In this paper, we present TagIt, a scalable data management service framework aimed at scientific datasets, which is tightly integrated into a shared-nothing distributed file system. A key feature of TagIt is a scalable, distributed metadata indexing framework, using which we implement a flexible tagging capability to support data discovery. The tags can also be associated with an active operator, for pre-processing, filtering, or automatic metadata extraction, which we seamlessly offload to file servers in a load-aware fashion. Our evaluation shows that TagIt can expedite data search by up to 10X over the extant decoupled approach.
- Research Organization:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
- Sponsoring Organization:
- USDOE Office of Science (SC)
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1567467
- Resource Relation:
- Conference: SC '17 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
- Country of Publication:
- United States
- Language:
- English
A scalable search engine for mass storage smart objects
|
journal | May 2015 |
Collaborative data analytics with DataHub
|
journal | August 2015 |
Views, authorization, and locking in a relational data base system
|
conference | January 1975 |
LazyBase: trading freshness for performance in a scalable database
|
conference | January 2012 |
AMIP: The Atmospheric Model Intercomparison Project
|
journal | December 1992 |
The Google file system
|
conference | January 2003 |
Semantic file systems
|
conference | January 1991 |
Comparative I/O workload characterization of two leadership class storage clusters
|
conference | January 2015 |
Workload characterization of a leadership class storage cluster
|
conference | November 2010 |
DAOS and Friends: A Proposal for an Exascale Storage System
|
conference | November 2016 |
A fast file system for UNIX
|
journal | August 1984 |
Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems
|
conference | November 2014 |
The Hadoop Distributed File System
|
conference | May 2010 |
VSFS: A Searchable Distributed File System
|
conference | November 2014 |
Propeller: A Scalable Real-Time File-Search Service in Distributed Systems
|
conference | June 2014 |
FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems
|
conference | October 2014 |
Similar Records
An Integrated Indexing and Search Service for Distributed File Systems
Design and Implementation of Ceph: A Scalable Distributed File System