DOE CODE: Project Metadata for Code ID 54998

DOE CODE / / Parallel k-means++

Parallel k-means++

Full Project

Abstract

A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.

Release Date:: 2017-04-04

Project Type:: Open Source, Publicly Available Repository

Software Type:: Scientific

Licenses:: Other (Commercial or Open-Source): https://github.com/patrickmackey/parallel_kpp/blob/master/LICENSE.txt

Sponsoring Org.:: USDOE

Primary Award/Contract Number:

AC05-76RL01830

Code ID:: 54998

Site Accession Number:: 7426; Battelle IPID 31119

Research Org.:: Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)

Country of Origin:: United States

Citation Formats

Parallel k-means++. Computer Software. https://github.com/patrickmackey/parallel_kpp. USDOE. 04 Apr. 2017. Web. doi:10.11578/dc.20210416.80.

(2017, April 04). Parallel k-means++. [Computer software]. https://github.com/patrickmackey/parallel_kpp. https://doi.org/10.11578/dc.20210416.80.

"Parallel k-means++." Computer software. April 04, 2017. https://github.com/patrickmackey/parallel_kpp. https://doi.org/10.11578/dc.20210416.80.

@misc{ doecode_54998,

title = {Parallel k-means++},

author = ,

abstractNote = {A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.},

doi = {10.11578/dc.20210416.80},

url = {https://doi.org/10.11578/dc.20210416.80},

howpublished = {[Computer Software] \url{https://doi.org/10.11578/dc.20210416.80}},

year = {2017},

month = {apr}

}

RESOURCE

SAVE / SHARE

Abstract

RESOURCE

SAVE / SHARE

Citation Formats