skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Query estimation and order-optimized iteration in very large federations

Conference ·
OSTI ID:656716

Objectivity federated databases may contain many terabytes of data and span thousands of files. In such an environment, it is often easy for a user to pose a query that may return an iterator over millions of objects, requiring opening thousands of databases. This presentation describes several technologies developed for such settings: (1) a query estimator, which tells the user how many objects satisfy the query, and how many databases will be touched, prior to opening all of those files; (2) an order-optimized iterator, which behaves like an ordinary iterator except that elements are returned in an order optimized for efficient access, presorted by the database (and container) in which they reside; (3) a parallel implementation of the order-optimized iterator, allowing any number of processes in a parallel or distributed system to iterate over disjoint subcollections of terms satisfying the query, partitioned by the database or container in which the items reside. These technologies have been developed for scientific experiments that will require handling thousands of terabytes of data annually, but they are intended to be applicable in other massive data settings as well. In such environments, significant amounts of data will reside on tertiary storage, accessible via Objectivity`s recently-announced HPSS (High Performance Storage System) interface. When deployed in large-scale physics settings later in 1998, the query estimator will further inform the user of the number of tape mounts required to satisfy the query, and provide rough time estimates for data delivery. The order-optimized iterator will be connected to a cache manager that will prefetch from tape to disk the files needed by the query (known from the query estimation step), and will decide which items to deliver to the user next according to the order in which data become available in the disk cache.

Research Organization:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE Office of Energy Research, Washington, DC (United States)
DOE Contract Number:
W-31109-ENG-38
OSTI ID:
656716
Report Number(s):
ANL/HEP/CP-98-38; CONF-980577-; ON: DE98057832; TRN: 99:000244
Resource Relation:
Conference: Objectivity worldview `98 conference, Berkeley, CA (United States), 14-15 May 1998; Other Information: PBD: 4 May 1998
Country of Publication:
United States
Language:
English