A Parallel EM Algorithm for Model-Based Clustering Applied to the Exploration of Large Spatio-Temporal Data
- ORNL
- Lawrence Berkeley National Laboratory (LBNL)
We develop a parallel EM algorithm for multivariate Gaussian mixture models and use it to perform model-based clustering of a large climate data set. Three variants of the EM algorithm are reformulated in parallel and a new variant that is faster is presented. All are implemented using the single program, multiple data (SPMD) programming model, which is able to take advantage of the combined collective memory of large distributed computer architectures to process larger data sets. Displays of the estimated mixture model rather than the data allow us to explore multivariate relationships in a way that scales to arbitrary size data. We study the performance of our methodology on simulated data and apply our methodology to a high resolution climate dataset produced by the community atmosphere model (CAM5). This article has supplementary material online.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
- Sponsoring Organization:
- USDOE Office of Science (SC)
- DOE Contract Number:
- DE-AC05-00OR22725
- OSTI ID:
- 1110857
- Journal Information:
- Technometrics, Vol. 55, Issue 4; ISSN 0040-1706
- Country of Publication:
- United States
- Language:
- English
Similar Records
Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers
A Parallel EM Algorithm for Model-Based Clustering Applied to the Exploration of Large Spatio-Temporal Data