Abstract
A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.
- Release Date:
- 2017-04-04
- Project Type:
- Open Source, Publicly Available Repository
- Software Type:
- Scientific
- Licenses:
-
Other (Commercial or Open-Source): https://github.com/patrickmackey/parallel_kpp/blob/master/LICENSE.txt
- Sponsoring Org.:
-
USDOEPrimary Award/Contract Number:AC05-76RL01830
- Code ID:
- 54998
- Site Accession Number:
- 7426; Battelle IPID 31119
- Research Org.:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Country of Origin:
- United States
Citation Formats
Parallel k-means++.
Computer Software.
https://github.com/patrickmackey/parallel_kpp.
USDOE.
04 Apr. 2017.
Web.
doi:10.11578/dc.20210416.80.
(2017, April 04).
Parallel k-means++.
[Computer software].
https://github.com/patrickmackey/parallel_kpp.
https://doi.org/10.11578/dc.20210416.80.
"Parallel k-means++." Computer software.
April 04, 2017.
https://github.com/patrickmackey/parallel_kpp.
https://doi.org/10.11578/dc.20210416.80.
@misc{
doecode_54998,
title = {Parallel k-means++},
author = ,
abstractNote = {A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.},
doi = {10.11578/dc.20210416.80},
url = {https://doi.org/10.11578/dc.20210416.80},
howpublished = {[Computer Software] \url{https://doi.org/10.11578/dc.20210416.80}},
year = {2017},
month = {apr}
}