skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Tagit: an integrated indexing and search service for file systems

Conference ·
 [1];  [2];  [3];  [3];  [3];  [4]
  1. Virginia Tech and Oak Ridge National Laboratory
  2. Sogang University
  3. Oak Ridge National Laboratory
  4. Virginia Tech

Data services such as search, discovery, and management in scalable distributed environments have traditionally been decoupled from the underlying file systems, and are often deployed using external databases and indexing services. However, modern data production rates, looming data movement costs, and the lack of metadata, entail revisiting the decoupled file system-data services design philosophy. In this paper, we present TagIt, a scalable data management service framework aimed at scientific datasets, which is tightly integrated into a shared-nothing distributed file system. A key feature of TagIt is a scalable, distributed metadata indexing framework, using which we implement a flexible tagging capability to support data discovery. The tags can also be associated with an active operator, for pre-processing, filtering, or automatic metadata extraction, which we seamlessly offload to file servers in a load-aware fashion. Our evaluation shows that TagIt can expedite data search by up to 10X over the extant decoupled approach.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Organization:
USDOE Office of Science (SC)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1567467
Resource Relation:
Conference: SC '17 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Country of Publication:
United States
Language:
English

References (16)

A scalable search engine for mass storage smart objects journal May 2015
Collaborative data analytics with DataHub journal August 2015
Views, authorization, and locking in a relational data base system conference January 1975
LazyBase: trading freshness for performance in a scalable database conference January 2012
AMIP: The Atmospheric Model Intercomparison Project journal December 1992
The Google file system conference January 2003
Semantic file systems conference January 1991
Comparative I/O workload characterization of two leadership class storage clusters conference January 2015
Workload characterization of a leadership class storage cluster conference November 2010
DAOS and Friends: A Proposal for an Exascale Storage System
  • Lofstead, Jay; Jimenez, Ivo; Maltzahn, Carlos
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.49
conference November 2016
A fast file system for UNIX journal August 1984
Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems
  • Oral, Sarp; Simmons, James; Hill, Jason
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.23
conference November 2014
The Hadoop Distributed File System conference May 2010
VSFS: A Searchable Distributed File System conference November 2014
Propeller: A Scalable Real-Time File-Search Service in Distributed Systems conference June 2014
FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems conference October 2014

Similar Records

TagIt: An Integrated Indexing and Search Service for File Systems
Conference · Wed Nov 01 00:00:00 EDT 2017 · OSTI ID:1567467

An Integrated Indexing and Search Service for Distributed File Systems
Journal Article · Mon Apr 27 00:00:00 EDT 2020 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1567467

Design and Implementation of Ceph: A Scalable Distributed File System
Conference · Wed Apr 19 00:00:00 EDT 2006 · OSTI ID:1567467