skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Optimal Chunking of Large Multidimensional Arrays for Data Warehousing

Journal Article · · INFORMATION SYSTEMS
OSTI ID:934985

Very large multidimensional arrays are commonly used in data intensive scientific computations as well as on-line analytical processingapplications referred to as MOLAP. The storage organization of such arrays on disks is done by partitioning the large global array into fixed size sub-arrays called chunks or tiles that form the units of data transfer between disk and memory. Typical queries involve the retrieval of sub-arrays in a manner that access all chunks that overlap the query results. An important metric of the storage efficiency is the expected number of chunks retrieved over all such queries. The question that immediately arises is"what shapes of array chunks give the minimum expected number of chunks over a query workload?" The problem of optimal chunking was first introduced by Sarawagi and Stonebraker who gave an approximate solution. In this paper we develop exact mathematical models of the problem and provide exact solutions using steepest descent and geometric programming methods. Experimental results, using synthetic and real life workloads, show that our solutions are consistently within than 2.0percent of the true number of chunks retrieved for any number of dimensions. In contrast, the approximate solution of Sarawagi and Stonebraker can deviate considerably from the true result with increasing number of dimensions and also may lead to suboptimal chunk shapes.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
Computational Research Division
DOE Contract Number:
DE-AC02-05CH11231
OSTI ID:
934985
Report Number(s):
LBNL-697E; TRN: US200815%%68
Journal Information:
INFORMATION SYSTEMS, Journal Name: INFORMATION SYSTEMS
Country of Publication:
United States
Language:
English

Similar Records

Chunking of Large Multidimensional Arrays
Technical Report · Wed Feb 28 00:00:00 EST 2007 · OSTI ID:934985

An efficient abstract interface for multidimensional array I/O
Conference · Sat Dec 31 00:00:00 EST 1994 · OSTI ID:934985

The fast cubic algorithm
Conference · Sat Dec 31 00:00:00 EST 1994 · OSTI ID:934985