An Integrated Indexing and Search Service for Distributed File Systems
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sohang Univ., Seoul (Korea, Republic of)
- Virginia Polytechnic Inst. and State Univ. (Virginia Tech), Blacksburg, VA (United States)
Data services such as search, discovery, and management in scalable distributed environments have traditionally been decoupled from the underlying file systems, and are often deployed using external databases and indexing services. However, modern data production rates, looming data movement costs, and the lack of metadata, entail revisiting the decoupled file system-data services design philosophy. In this article, we present TagIt, a scalable data management service framework aimed at scientific datasets, which can be integrated into prevalent distributed file system architectures. A key feature of TagIt is a scalable, distributed metadata indexing framework, which facilitates a flexible tagging capability to support data discovery. Furthermore, the tags can also be associated with an active operator, for pre-processing, filtering, or automatic metadata extraction, which we seamlessly offload to file servers in a load-aware fashion. We have integrated TagIt into two popular distributed file systems, i.e., GlusterFS and CephFS. Our evaluation demonstrates that TagIt can expedite data search operation by up to 10× over the extant decoupled approach.
- Research Organization:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- Grant/Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1632079
- Journal Information:
- IEEE Transactions on Parallel and Distributed Systems, Vol. 31, Issue 10; ISSN 1045-9219
- Publisher:
- IEEECopyright Statement
- Country of Publication:
- United States
- Language:
- English
Web of Science
Similar Records
TagIt: An Integrated Indexing and Search Service for File Systems
Design and Implementation of Ceph: A Scalable Distributed File System