skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Parallel k-means++

Abstract

A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.

Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
Contributing Org.:
Battelle Memorial Institute, Pacific Northwest Division (PNNL)
OSTI Identifier:
1349659
Report Number(s):
Parallel k-means++; 005208MLTPL00
Battelle IPID 31119-E
DOE Contract Number:  
AC05-76RL01830
Resource Type:
Software
Software Revision:
00
Software Package Number:
005208
Software CPU:
MLTPL
Open Source:
Yes
No third party code included.
Source Code Available:
Yes
Other Software Info:
Open source available at https://store.pnnl.gov and github account.
Country of Publication:
United States

Citation Formats

. Parallel k-means++. Computer software. https://www.osti.gov//servlets/purl/1349659. Vers. 00. USDOE. 4 Apr. 2017. Web.
. (2017, April 4). Parallel k-means++ (Version 00) [Computer software]. https://www.osti.gov//servlets/purl/1349659.
. Parallel k-means++. Computer software. Version 00. April 4, 2017. https://www.osti.gov//servlets/purl/1349659.
@misc{osti_1349659,
title = {Parallel k-means++, Version 00},
author = {},
abstractNote = {A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.},
url = {https://www.osti.gov//servlets/purl/1349659},
doi = {},
year = {2017},
month = {4},
note =
}