skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Query estimation and order-optimized iteration in very large federations

Abstract

Objectivity federated databases may contain many terabytes of data and span thousands of files. In such an environment, it is often easy for a user to pose a query that may return an iterator over millions of objects, requiring opening thousands of databases. This presentation describes several technologies developed for such settings: (1) a query estimator, which tells the user how many objects satisfy the query, and how many databases will be touched, prior to opening all of those files; (2) an order-optimized iterator, which behaves like an ordinary iterator except that elements are returned in an order optimized for efficient access, presorted by the database (and container) in which they reside; (3) a parallel implementation of the order-optimized iterator, allowing any number of processes in a parallel or distributed system to iterate over disjoint subcollections of terms satisfying the query, partitioned by the database or container in which the items reside. These technologies have been developed for scientific experiments that will require handling thousands of terabytes of data annually, but they are intended to be applicable in other massive data settings as well. In such environments, significant amounts of data will reside on tertiary storage, accessible via Objectivity`s recently-announcedmore » HPSS (High Performance Storage System) interface. When deployed in large-scale physics settings later in 1998, the query estimator will further inform the user of the number of tape mounts required to satisfy the query, and provide rough time estimates for data delivery. The order-optimized iterator will be connected to a cache manager that will prefetch from tape to disk the files needed by the query (known from the query estimation step), and will decide which items to deliver to the user next according to the order in which data become available in the disk cache.« less

Authors:
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Energy Research, Washington, DC (United States)
OSTI Identifier:
656716
Report Number(s):
ANL/HEP/CP-98-38; CONF-980577-
ON: DE98057832; TRN: 99:000244
DOE Contract Number:  
W-31109-ENG-38
Resource Type:
Conference
Resource Relation:
Conference: Objectivity worldview `98 conference, Berkeley, CA (United States), 14-15 May 1998; Other Information: PBD: 4 May 1998
Country of Publication:
United States
Language:
English
Subject:
99 MATHEMATICS, COMPUTERS, INFORMATION SCIENCE, MANAGEMENT, LAW, MISCELLANEOUS; 66 PHYSICS; INFORMATION SYSTEMS; INFORMATION RETRIEVAL; HIGH ENERGY PHYSICS; NUCLEAR PHYSICS; DATA PROCESSING; MAN-MACHINE SYSTEMS

Citation Formats

Malon, D M, and HENP Grand Challenge Collaboration. Query estimation and order-optimized iteration in very large federations. United States: N. p., 1998. Web.
Malon, D M, & HENP Grand Challenge Collaboration. Query estimation and order-optimized iteration in very large federations. United States.
Malon, D M, and HENP Grand Challenge Collaboration. 1998. "Query estimation and order-optimized iteration in very large federations". United States. https://www.osti.gov/servlets/purl/656716.
@article{osti_656716,
title = {Query estimation and order-optimized iteration in very large federations},
author = {Malon, D M and HENP Grand Challenge Collaboration},
abstractNote = {Objectivity federated databases may contain many terabytes of data and span thousands of files. In such an environment, it is often easy for a user to pose a query that may return an iterator over millions of objects, requiring opening thousands of databases. This presentation describes several technologies developed for such settings: (1) a query estimator, which tells the user how many objects satisfy the query, and how many databases will be touched, prior to opening all of those files; (2) an order-optimized iterator, which behaves like an ordinary iterator except that elements are returned in an order optimized for efficient access, presorted by the database (and container) in which they reside; (3) a parallel implementation of the order-optimized iterator, allowing any number of processes in a parallel or distributed system to iterate over disjoint subcollections of terms satisfying the query, partitioned by the database or container in which the items reside. These technologies have been developed for scientific experiments that will require handling thousands of terabytes of data annually, but they are intended to be applicable in other massive data settings as well. In such environments, significant amounts of data will reside on tertiary storage, accessible via Objectivity`s recently-announced HPSS (High Performance Storage System) interface. When deployed in large-scale physics settings later in 1998, the query estimator will further inform the user of the number of tape mounts required to satisfy the query, and provide rough time estimates for data delivery. The order-optimized iterator will be connected to a cache manager that will prefetch from tape to disk the files needed by the query (known from the query estimation step), and will decide which items to deliver to the user next according to the order in which data become available in the disk cache.},
doi = {},
url = {https://www.osti.gov/biblio/656716}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon May 04 00:00:00 EDT 1998},
month = {Mon May 04 00:00:00 EDT 1998}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: