Query estimation and order-optimized iteration in very large federations
Conference
·
OSTI ID:656716
Objectivity federated databases may contain many terabytes of data and span thousands of files. In such an environment, it is often easy for a user to pose a query that may return an iterator over millions of objects, requiring opening thousands of databases. This presentation describes several technologies developed for such settings: (1) a query estimator, which tells the user how many objects satisfy the query, and how many databases will be touched, prior to opening all of those files; (2) an order-optimized iterator, which behaves like an ordinary iterator except that elements are returned in an order optimized for efficient access, presorted by the database (and container) in which they reside; (3) a parallel implementation of the order-optimized iterator, allowing any number of processes in a parallel or distributed system to iterate over disjoint subcollections of terms satisfying the query, partitioned by the database or container in which the items reside. These technologies have been developed for scientific experiments that will require handling thousands of terabytes of data annually, but they are intended to be applicable in other massive data settings as well. In such environments, significant amounts of data will reside on tertiary storage, accessible via Objectivity`s recently-announced HPSS (High Performance Storage System) interface. When deployed in large-scale physics settings later in 1998, the query estimator will further inform the user of the number of tape mounts required to satisfy the query, and provide rough time estimates for data delivery. The order-optimized iterator will be connected to a cache manager that will prefetch from tape to disk the files needed by the query (known from the query estimation step), and will decide which items to deliver to the user next according to the order in which data become available in the disk cache.
- Research Organization:
- Argonne National Lab., IL (United States)
- Sponsoring Organization:
- USDOE Office of Energy Research, Washington, DC (United States)
- DOE Contract Number:
- W-31109-ENG-38
- OSTI ID:
- 656716
- Report Number(s):
- ANL/HEP/CP--98-38; CONF-980577--; ON: DE98057832
- Country of Publication:
- United States
- Language:
- English
Similar Records
New capabilities in the HENP grand challenge storage access systemand its application at RHIC
Milestone Report - Level-2 Milestone 5589: Modernization and Expansion of LLNL Archive Disk Cache
Distributed data access in the sequential access model at the D0 experiment at Fermilab
Conference
·
Tue Apr 25 00:00:00 EDT 2000
·
OSTI ID:901026
Milestone Report - Level-2 Milestone 5589: Modernization and Expansion of LLNL Archive Disk Cache
Technical Report
·
Wed Feb 03 23:00:00 EST 2016
·
OSTI ID:1239198
Distributed data access in the sequential access model at the D0 experiment at Fermilab
Conference
·
Wed Jul 05 00:00:00 EDT 2000
·
OSTI ID:757586