skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Chunking of Large Multidimensional Arrays

Abstract

Data intensive scientific computations as well on-lineanalytical processing applications as are done on very large datasetsthat are modeled as k-dimensional arrays. The storage organization ofsuch arrays on disks is done by partitioning the large global array intofixed size hyper-rectangular sub-arrays called chunks or tiles that formthe units of data transfer between disk and memory. Typical queriesinvolve the retrieval of sub-arrays in a manner that accesses all chunksthat overlap the query results. An important metric of the storageefficiency is the expected number of chunks retrieved over all suchqueries. The question that immediately arises is "what shapes of arraychunks give the minimum expected number of chunks over a query workload?"In this paper we develop two probabilistic mathematical models of theproblem and provide exact solutions using steepest descent and geometricprogramming methods. Experimental results, using synthetic workloads onreal life data sets, show that our chunking is much more efficient thanthe existing approximate solutions.

Authors:
; ;
Publication Date:
Research Org.:
Ernest Orlando Lawrence Berkeley NationalLaboratory, Berkeley, CA (US)
Sponsoring Org.:
USDOE Director. Office of Science. Advanced ScientificComputing Research
OSTI Identifier:
927033
Report Number(s):
LBNL-63230
R&D Project: 429201; BnR: KJ0101030; TRN: US200810%%206
DOE Contract Number:
DE-AC02-05CH11231
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
99; EFFICIENCY; EXACT SOLUTIONS; MATHEMATICAL MODELS; METRICS; PROCESSING; PROGRAMMING; STORAGE; Multi-dimensional Arrays Algorithm Array Chunking

Citation Formats

Rotem, Doron, Otoo, Ekow J., and Seshadri, Sridhar. Chunking of Large Multidimensional Arrays. United States: N. p., 2007. Web. doi:10.2172/927033.
Rotem, Doron, Otoo, Ekow J., & Seshadri, Sridhar. Chunking of Large Multidimensional Arrays. United States. doi:10.2172/927033.
Rotem, Doron, Otoo, Ekow J., and Seshadri, Sridhar. Wed . "Chunking of Large Multidimensional Arrays". United States. doi:10.2172/927033. https://www.osti.gov/servlets/purl/927033.
@article{osti_927033,
title = {Chunking of Large Multidimensional Arrays},
author = {Rotem, Doron and Otoo, Ekow J. and Seshadri, Sridhar},
abstractNote = {Data intensive scientific computations as well on-lineanalytical processing applications as are done on very large datasetsthat are modeled as k-dimensional arrays. The storage organization ofsuch arrays on disks is done by partitioning the large global array intofixed size hyper-rectangular sub-arrays called chunks or tiles that formthe units of data transfer between disk and memory. Typical queriesinvolve the retrieval of sub-arrays in a manner that accesses all chunksthat overlap the query results. An important metric of the storageefficiency is the expected number of chunks retrieved over all suchqueries. The question that immediately arises is "what shapes of arraychunks give the minimum expected number of chunks over a query workload?"In this paper we develop two probabilistic mathematical models of theproblem and provide exact solutions using steepest descent and geometricprogramming methods. Experimental results, using synthetic workloads onreal life data sets, show that our chunking is much more efficient thanthe existing approximate solutions.},
doi = {10.2172/927033},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Feb 28 00:00:00 EST 2007},
month = {Wed Feb 28 00:00:00 EST 2007}
}

Technical Report:

Save / Share:
  • The SCORE-EVET code was developed to study multidimensional transient fluid flow in nuclear reactor fuel rod arrays. The conservation equations used were derived by volume averaging the transient compressible three-dimensional local continuum equations in Cartesian coordinates. No assumptions associated with subchannel flow have been incorporated into the derivation of the conservation equations. In addition to the three-dimensional fluid flow equations, the SCORE-EVET code ocntains: (a) a one-dimensional steady state solution scheme to initialize the flow field, (b) steady state and transient fuel rod conduction models, and (c) comprehensive correlation packages to describe fluid-to-fuel rod interfacial energy and momentum exchange. Velocitymore » and pressure boundary conditions can be specified as a function of time and space to model reactor transient conditions such as a hypothesized loss-of-coolant accident (LOCA) or flow blockage.« less
  • The objective of this research is to develop comprehensive constitutive models for multidimensional two-phase flow in rod arrays. The constitutive parameters are the solid-fluid flow resistance and the gas-liquid interfacial momentum exchange force. This report covers work in four areas: (1) a correlation for flow resistance across banks of tubes which is independent of rod arrangement has been developed. The correlation was developed from data from three rod arrangements covering a Reynolds number range (based on superficial velocity) of 1 to 40,000; (2) complete pressure drop data for water flows in the laminar region in crossflow and 45/sup 0/ inclinedmore » rod arrays were taken; (3) the development of a model for the interfacial momentum exchange force in bubbly flows has been completed. This model has been validated against single bubble velocity data in inclined rod arrays. The model has been cast in a form suitable for implementation to two-fluid computer codes; and (4) rise velocities of bubbles in 0/sup 0/, 45/sup 0/, and 90/sup 0/ inclined rod arrays have been measured. This data should prove useful for the development of a bubble drag coefficient model for rod arrays.« less
  • The objective of this research is to develop comprehensive constitutive models for the hydrodynamics of flows at oblique angles in rod arrays, and determine their impact on design and performance analysis of heat exchanging components. The constitutive parameters are for single-phase flow the solid-fluid flow resistance, and for two-phase flow the phase flow resistances and relative phase motion. This report covers accomplishments of three tasks: (1) superposition models, data, and correlations for multi-dimensional, single-phase flow resistance are reviewed and compared; (2) two-phase flow observations and an individual bubble trajectory model for oblique flows are presented; and (3) single-phase flow analysesmore » of several baffled heat exchangers were performed to determine the sensitivity of calculated pressure and flow fields to the choice of superposition model.« less
  • The project establishes the technical feasibility of a visualization workstation for very large data sets. The Phase 1 system consists of an IBM PC/AT with 2 Mbytes of expanded memory, frame buffer, and write-once optical disk drive. The latter provides random access to 200 Mbytes on a removable medium. Data from a supercomputer (or from any process, such as an experiment, that generates voluminous data in matrix form) can be written to this medium and easily transported (e.g., mailed) to the user's worksite. Software has been developed that will afford the user interactive visual access to these data in themore » form of orthogonal sections and contour surface renderings. Strategies for displaying multi-variate three-dimensional data and for producing interactive animated displays of data in three-dimensions plus time are developed.« less