Parallel k-means++
A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.
- Short Name / Acronym:
- Parallel k-means++
- Site Accession Number:
- 7426; Battelle IPID 31119
- Software Type:
- Scientific
- License(s):
- Other (Commercial or Open-Source)
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOEPrimary Award/Contract Number:AC05-76RL01830
- DOE Contract Number:
- AC05-76RL01830
- Code ID:
- 54998
- OSTI ID:
- code-54998
- Country of Origin:
- United States
Similar Records
A Highly Parallel Implementation of K-Means for Multithreaded Architecture
Parallelizing the Unpacking and Clustering of Detector Data for Reconstruction of Charged Particle Tracks on Multi-core CPUs and Many-core GPUs