skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Multivariate Clustering of Large-Scale Simulation Data

Conference ·
OSTI ID:15004528

Simulations of complex scientific phenomena involve the execution of massively parallel computer programs. These simulation programs generate large-scale data sets over the spatiotemporal space. Modeling such massive data sets is an essential step in helping scientists discover new information from their computer simulations. In this paper, we present a simple but effective multivariate clustering algorithm for large-scale scientific simulation data sets. Our algorithm utilizes the cosine similarity measure to cluster the field variables in a data set. Field variables include all variables except the spatial (x, y, z) and temporal (time) variables. The exclusion of the spatial space is important since 'similar' characteristics could be located (spatially) far from each other. To scale our multivariate clustering algorithm for large-scale data sets, we take advantage of the geometrical properties of the cosine similarity measure. This allows us to reduce the modeling time from O(n{sup 2}) to O(n x g(f(u))), where n is the number of data points, f(u) is a function of the user-defined clustering threshold, and g(f(u)) is the number of data points satisfying the threshold f(u). We show that on average g(f(u)) is much less than n. Finally, even though spatial variables do not play a role in building a cluster, it is desirable to associate each cluster with its correct spatial space. To achieve this, we present a linking algorithm for connecting each cluster to the appropriate nodes of the data set's topology tree (where the spatial information of the data set is stored). Our experimental evaluations on two large-scale simulation data sets illustrate the value of our multivariate clustering and linking algorithms.

Research Organization:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
W-7405-ENG-48
OSTI ID:
15004528
Report Number(s):
UCRL-JC-151860; TRN: US201015%%682
Resource Relation:
Conference: Ninth Association of Computing Machinery International Conference on Knowledge Discovery and Data Mining, Washington, DC, Aug 24 - Aug 27, 2003
Country of Publication:
United States
Language:
English