skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Chunking of Large Multidimensional Arrays

Abstract

Data intensive scientific computations as well on-lineanalytical processing applications as are done on very large datasetsthat are modeled as k-dimensional arrays. The storage organization ofsuch arrays on disks is done by partitioning the large global array intofixed size hyper-rectangular sub-arrays called chunks or tiles that formthe units of data transfer between disk and memory. Typical queriesinvolve the retrieval of sub-arrays in a manner that accesses all chunksthat overlap the query results. An important metric of the storageefficiency is the expected number of chunks retrieved over all suchqueries. The question that immediately arises is "what shapes of arraychunks give the minimum expected number of chunks over a query workload?"In this paper we develop two probabilistic mathematical models of theproblem and provide exact solutions using steepest descent and geometricprogramming methods. Experimental results, using synthetic workloads onreal life data sets, show that our chunking is much more efficient thanthe existing approximate solutions.

Authors:
; ;
Publication Date:
Research Org.:
Ernest Orlando Lawrence Berkeley NationalLaboratory, Berkeley, CA (US)
Sponsoring Org.:
USDOE Director. Office of Science. Advanced ScientificComputing Research
OSTI Identifier:
927033
Report Number(s):
LBNL-63230
R&D Project: 429201; BnR: KJ0101030; TRN: US200810%%206
DOE Contract Number:  
DE-AC02-05CH11231
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
99; EFFICIENCY; EXACT SOLUTIONS; MATHEMATICAL MODELS; METRICS; PROCESSING; PROGRAMMING; STORAGE; Multi-dimensional Arrays Algorithm Array Chunking

Citation Formats

Rotem, Doron, Otoo, Ekow J., and Seshadri, Sridhar. Chunking of Large Multidimensional Arrays. United States: N. p., 2007. Web. doi:10.2172/927033.
Rotem, Doron, Otoo, Ekow J., & Seshadri, Sridhar. Chunking of Large Multidimensional Arrays. United States. doi:10.2172/927033.
Rotem, Doron, Otoo, Ekow J., and Seshadri, Sridhar. Wed . "Chunking of Large Multidimensional Arrays". United States. doi:10.2172/927033. https://www.osti.gov/servlets/purl/927033.
@article{osti_927033,
title = {Chunking of Large Multidimensional Arrays},
author = {Rotem, Doron and Otoo, Ekow J. and Seshadri, Sridhar},
abstractNote = {Data intensive scientific computations as well on-lineanalytical processing applications as are done on very large datasetsthat are modeled as k-dimensional arrays. The storage organization ofsuch arrays on disks is done by partitioning the large global array intofixed size hyper-rectangular sub-arrays called chunks or tiles that formthe units of data transfer between disk and memory. Typical queriesinvolve the retrieval of sub-arrays in a manner that accesses all chunksthat overlap the query results. An important metric of the storageefficiency is the expected number of chunks retrieved over all suchqueries. The question that immediately arises is "what shapes of arraychunks give the minimum expected number of chunks over a query workload?"In this paper we develop two probabilistic mathematical models of theproblem and provide exact solutions using steepest descent and geometricprogramming methods. Experimental results, using synthetic workloads onreal life data sets, show that our chunking is much more efficient thanthe existing approximate solutions.},
doi = {10.2172/927033},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Feb 28 00:00:00 EST 2007},
month = {Wed Feb 28 00:00:00 EST 2007}
}

Technical Report:

Save / Share: