Chunking of Large Multidimensional Arrays
Abstract
Data intensive scientific computations as well onlineanalytical processing applications as are done on very large datasetsthat are modeled as kdimensional arrays. The storage organization ofsuch arrays on disks is done by partitioning the large global array intofixed size hyperrectangular subarrays called chunks or tiles that formthe units of data transfer between disk and memory. Typical queriesinvolve the retrieval of subarrays in a manner that accesses all chunksthat overlap the query results. An important metric of the storageefficiency is the expected number of chunks retrieved over all suchqueries. The question that immediately arises is "what shapes of arraychunks give the minimum expected number of chunks over a query workload?"In this paper we develop two probabilistic mathematical models of theproblem and provide exact solutions using steepest descent and geometricprogramming methods. Experimental results, using synthetic workloads onreal life data sets, show that our chunking is much more efficient thanthe existing approximate solutions.
 Authors:
 Publication Date:
 Research Org.:
 Ernest Orlando Lawrence Berkeley NationalLaboratory, Berkeley, CA (US)
 Sponsoring Org.:
 USDOE Director. Office of Science. Advanced ScientificComputing Research
 OSTI Identifier:
 927033
 Report Number(s):
 LBNL63230
R&D Project: 429201; BnR: KJ0101030; TRN: US200810%%206
 DOE Contract Number:
 DEAC0205CH11231
 Resource Type:
 Technical Report
 Country of Publication:
 United States
 Language:
 English
 Subject:
 99; EFFICIENCY; EXACT SOLUTIONS; MATHEMATICAL MODELS; METRICS; PROCESSING; PROGRAMMING; STORAGE; Multidimensional Arrays Algorithm Array Chunking
Citation Formats
Rotem, Doron, Otoo, Ekow J., and Seshadri, Sridhar. Chunking of Large Multidimensional Arrays. United States: N. p., 2007.
Web. doi:10.2172/927033.
Rotem, Doron, Otoo, Ekow J., & Seshadri, Sridhar. Chunking of Large Multidimensional Arrays. United States. doi:10.2172/927033.
Rotem, Doron, Otoo, Ekow J., and Seshadri, Sridhar. Wed .
"Chunking of Large Multidimensional Arrays". United States.
doi:10.2172/927033. https://www.osti.gov/servlets/purl/927033.
@article{osti_927033,
title = {Chunking of Large Multidimensional Arrays},
author = {Rotem, Doron and Otoo, Ekow J. and Seshadri, Sridhar},
abstractNote = {Data intensive scientific computations as well onlineanalytical processing applications as are done on very large datasetsthat are modeled as kdimensional arrays. The storage organization ofsuch arrays on disks is done by partitioning the large global array intofixed size hyperrectangular subarrays called chunks or tiles that formthe units of data transfer between disk and memory. Typical queriesinvolve the retrieval of subarrays in a manner that accesses all chunksthat overlap the query results. An important metric of the storageefficiency is the expected number of chunks retrieved over all suchqueries. The question that immediately arises is "what shapes of arraychunks give the minimum expected number of chunks over a query workload?"In this paper we develop two probabilistic mathematical models of theproblem and provide exact solutions using steepest descent and geometricprogramming methods. Experimental results, using synthetic workloads onreal life data sets, show that our chunking is much more efficient thanthe existing approximate solutions.},
doi = {10.2172/927033},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Feb 28 00:00:00 EST 2007},
month = {Wed Feb 28 00:00:00 EST 2007}
}

The SCOREEVET code was developed to study multidimensional transient fluid flow in nuclear reactor fuel rod arrays. The conservation equations used were derived by volume averaging the transient compressible threedimensional local continuum equations in Cartesian coordinates. No assumptions associated with subchannel flow have been incorporated into the derivation of the conservation equations. In addition to the threedimensional fluid flow equations, the SCOREEVET code ocntains: (a) a onedimensional steady state solution scheme to initialize the flow field, (b) steady state and transient fuel rod conduction models, and (c) comprehensive correlation packages to describe fluidtofuel rod interfacial energy and momentum exchange. Velocitymore »

Hydrodynamic prediction of multidimensional single and twophase flow in rod arrays. Progress report, January 1December 31, 1983
The objective of this research is to develop comprehensive constitutive models for multidimensional twophase flow in rod arrays. The constitutive parameters are the solidfluid flow resistance and the gasliquid interfacial momentum exchange force. This report covers work in four areas: (1) a correlation for flow resistance across banks of tubes which is independent of rod arrangement has been developed. The correlation was developed from data from three rod arrangements covering a Reynolds number range (based on superficial velocity) of 1 to 40,000; (2) complete pressure drop data for water flows in the laminar region in crossflow and 45/sup 0/ inclinedmore » 
Hydrodynamic prediction of multidimensional single and twophase flow in rod arrays. Progress report, May 15December 31, 1982. [LMFBR]
The objective of this research is to develop comprehensive constitutive models for the hydrodynamics of flows at oblique angles in rod arrays, and determine their impact on design and performance analysis of heat exchanging components. The constitutive parameters are for singlephase flow the solidfluid flow resistance, and for twophase flow the phase flow resistances and relative phase motion. This report covers accomplishments of three tasks: (1) superposition models, data, and correlations for multidimensional, singlephase flow resistance are reviewed and compared; (2) twophase flow observations and an individual bubble trajectory model for oblique flows are presented; and (3) singlephase flow analysesmore » 
Visualization of large, multidimensional multivariate data sets. Phase 1
The project establishes the technical feasibility of a visualization workstation for very large data sets. The Phase 1 system consists of an IBM PC/AT with 2 Mbytes of expanded memory, frame buffer, and writeonce optical disk drive. The latter provides random access to 200 Mbytes on a removable medium. Data from a supercomputer (or from any process, such as an experiment, that generates voluminous data in matrix form) can be written to this medium and easily transported (e.g., mailed) to the user's worksite. Software has been developed that will afford the user interactive visual access to these data in themore »