skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Statistical Modeling of Large-Scale Simulation Data

Abstract

With the advent of fast computer systems, Scientists are now able to generate terabytes of simulation data. Unfortunately, the shear size of these data sets has made efficient exploration of them impossible. To aid scientists in gathering knowledge from their simulation data, we have developed an ad-hoc query infrastructure. Our system, called AQSim (short for Ad-hoc Queries for Simulation) reduces the data storage requirements and access times in two stages. First, it creates and stores mathematical and statistical models of the data. Second, it evaluates queries on the models of the data instead of on the entire data set. In this paper, we present two simple but highly effective statistical modeling techniques for simulation data. Our first modeling technique computes the true mean of systematic partitions of the data. It makes no assumptions about the distribution of the data and uses a variant of the root mean square error to evaluate a model. In our second statistical modeling technique, we use the Andersen-Darling goodness-of-fit method on systematic partitions of the data. This second method evaluates a model by how well it passes the normality test on the data. Both of our statistical models summarize the data so as to answermore » range queries in the most effective way. We calculate precision on an answer to a query by scaling the one-sided Chebyshev Inequalities with the original mesh's topology. Our experimental evaluations on two scientific simulation data sets illustrate the value of using these statistical modeling techniques on large simulation data sets.« less

Authors:
; ;
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
15013592
Report Number(s):
UCRL-JC-147226
TRN: US200601%%429
DOE Contract Number:  
W-7405-ENG-48
Resource Type:
Conference
Resource Relation:
Conference: The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining., Edmonton, Alberta, Canada, Jul 23 - Jul 26, 2002
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; ACCURACY; COMPUTERS; DISTRIBUTION; EXPLORATION; MINING; SHEAR; SIMULATION; STATISTICAL MODELS; STORAGE; TOPOLOGY

Citation Formats

Eliassi-Rad, T, Critchlow, T, and Abdulla, G. Statistical Modeling of Large-Scale Simulation Data. United States: N. p., 2002. Web. doi:10.1145/775047.775118.
Eliassi-Rad, T, Critchlow, T, & Abdulla, G. Statistical Modeling of Large-Scale Simulation Data. United States. https://doi.org/10.1145/775047.775118
Eliassi-Rad, T, Critchlow, T, and Abdulla, G. Fri . "Statistical Modeling of Large-Scale Simulation Data". United States. https://doi.org/10.1145/775047.775118. https://www.osti.gov/servlets/purl/15013592.
@article{osti_15013592,
title = {Statistical Modeling of Large-Scale Simulation Data},
author = {Eliassi-Rad, T and Critchlow, T and Abdulla, G},
abstractNote = {With the advent of fast computer systems, Scientists are now able to generate terabytes of simulation data. Unfortunately, the shear size of these data sets has made efficient exploration of them impossible. To aid scientists in gathering knowledge from their simulation data, we have developed an ad-hoc query infrastructure. Our system, called AQSim (short for Ad-hoc Queries for Simulation) reduces the data storage requirements and access times in two stages. First, it creates and stores mathematical and statistical models of the data. Second, it evaluates queries on the models of the data instead of on the entire data set. In this paper, we present two simple but highly effective statistical modeling techniques for simulation data. Our first modeling technique computes the true mean of systematic partitions of the data. It makes no assumptions about the distribution of the data and uses a variant of the root mean square error to evaluate a model. In our second statistical modeling technique, we use the Andersen-Darling goodness-of-fit method on systematic partitions of the data. This second method evaluates a model by how well it passes the normality test on the data. Both of our statistical models summarize the data so as to answer range queries in the most effective way. We calculate precision on an answer to a query by scaling the one-sided Chebyshev Inequalities with the original mesh's topology. Our experimental evaluations on two scientific simulation data sets illustrate the value of using these statistical modeling techniques on large simulation data sets.},
doi = {10.1145/775047.775118},
url = {https://www.osti.gov/biblio/15013592}, journal = {},
number = ,
volume = ,
place = {United States},
year = {2002},
month = {2}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: