skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Multivariate Clustering of Large-Scale Scientific Simulation Data

Conference ·
OSTI ID:15005754

Simulations of complex scientific phenomena involve the execution of massively parallel computer programs. These simulation programs generate large-scale data sets over the spatio-temporal space. Modeling such massive data sets is an essential step in helping scientists discover new information from their computer simulations. In this paper, we present a simple but effective multivariate clustering algorithm for large-scale scientific simulation data sets. Our algorithm utilizes the cosine similarity measure to cluster the field variables in a data set. Field variables include all variables except the spatial (x, y, z) and temporal (time) variables. The exclusion of the spatial dimensions is important since ''similar'' characteristics could be located (spatially) far from each other. To scale our multivariate clustering algorithm for large-scale data sets, we take advantage of the geometrical properties of the cosine similarity measure. This allows us to reduce the modeling time from O(n{sup 2}) to O(n x g(f(u))), where n is the number of data points, f(u) is a function of the user-defined clustering threshold, and g(f(u)) is the number of data points satisfying f(u). We show that on average g(f(u)) is much less than n. Finally, even though spatial variables do not play a role in building clusters, it is desirable to associate each cluster with its correct spatial region. To achieve this, we present a linking algorithm for connecting each cluster to the appropriate nodes of the data set's topology tree (where the spatial information of the data set is stored). Our experimental evaluations on two large-scale simulation data sets illustrate the value of our multivariate clustering and linking algorithms.

Research Organization:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Organization:
US Department of Energy (US)
DOE Contract Number:
W-7405-ENG-48
OSTI ID:
15005754
Report Number(s):
UCRL-JC-153698; TRN: US200324%%239
Resource Relation:
Conference: Third IEEE International Conference on Mining, Melbourne, FL (US), 11/19/2003--11/22/2003; Other Information: PBD: 13 Jun 2003
Country of Publication:
United States
Language:
English