PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPU

Sao, Piyush; Prokopenko, Andrey; Lebrun-Grandie, Damien

doi:10.1145/3673038.3673148

PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPU

Conference · Thu Aug 01 00:00:00 EDT 2024

DOI:https://doi.org/10.1145/3673038.3673148· OSTI ID:2438688

^[1]; ^[1]; ^[1]

ORNL

This paper introduces Pandora, a parallel algorithm for computing dendrograms, the hierarchical cluster trees for single linkage clustering (SLC). Current parallel approaches construct dendrograms by partitioning a minimum spanning tree and removing edges. However, they struggle with skewed, hard-to-parallelize real-world dendrograms. Consequently, computing dendrograms is the sequential bottleneck in HDBSCAN*[21], a popular SLC variant. Pandora uses recursive tree contraction to address this limitation. Pandora contracts nodes to construct progressively smaller trees. It computes the smallest contracted dendrogram and expands it by inserting contracted edges. This recursive strategy is highly parallel, skew-independent, work-optimal, and well-suited for GPUs and multicores. We develop a performance portable implementation of Pandora in Kokkos[31] and evaluate its performance on multicore CPUs and multi-vendor GPUs (e.g., Nvidia, AMD) for dendrogram construction in HDBSCAN*. Multithreaded Pandora is 2.2x faster than the current best-multithreaded implementation. Our GPU version achieves 6-20x speedup on AMD GPUs and 10-37x on NVIDIA GPUs over multithreaded Pandora. Pandora removes HDBSCAN*’s sequential bottleneck, greatly boosting efficiency, particularly with GPUs.

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-00OR22725;

OSTI ID:: 2438688

Resource Type:: Conference paper/presentation

Conference Information:: 53rd International Conference on Parallel Processing (ICPP2024) - Visby, Gotland, Sweden - 8/12/2024-8/15/2024

Country of Publication:: United States

Language:: English

Similar Records

PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPU

Conference · Wed Jul 31 20:00:00 EDT 2024 · OSTI ID:3017042

Case Study of Using Kokkos and SYCLs Performance-Portable Frameworks for Milc-Dslash Benchmark on NVIDIA, AMD and Intel GPUs

Conference · Thu Dec 31 23:00:00 EST 2020 · OSTI ID:1892057

A single-tree algorithm to compute the Euclidean minimum spanning tree on GPUs

Conference · Sat Dec 31 23:00:00 EST 2022 · Proceedings of the International Conference on Parallel Processing · OSTI ID:1922321

PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPU

Citation Formats

Similar Records

Related Subjects