A single-tree algorithm to compute the Euclidean minimum spanning tree on GPUs
Conference
·
· Proceedings of the International Conference on Parallel Processing
- ORNL
Computing the Euclidean minimum spanning tree (Emst) is a computationally demanding step of many algorithms. While work-efficient serial and multithreaded algorithms for computing Emst are known, designing an efficient GPU algorithm is challenging due to a complex branching structure, data dependencies, and load imbalances. In this paper, we propose a single-tree $$Bor\mathring{u}vka$$-based algorithm for computing Emst on GPUs. We use an efficient nearest neighbor algorithm and reduce the number of the required distance calculations by avoiding traversing subtrees with leaf nodes in the same component. The developed algorithms are implemented in a performance portable way using ArborX, an open-source geometric search library based on the Kokkos framework. We evaluate the proposed algorithm on various 2D and 3D datasets, show and compare it with the current state-of-the-art open-source CPU implementations. We demonstrate 4-24 × speedup over the fastest multi-threaded implementation. We prove the portability of our implementation by providing results on a variety of hardware: AMD EPYC 7763, Nvidia A100 and AMD MI250X. We show scalability of the implementation, computing Emst for 37 million 3D cosmological dataset in under a 0.5 second on a single A100 Nvidia GPU.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE; USDOE Office of Science (SC)
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1922321
- Conference Information:
- Journal Name: Proceedings of the International Conference on Parallel Processing Journal Volume: 2022
- Country of Publication:
- United States
- Language:
- English
Similar Records
Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes
Performance Results on CPU/GPU Exascale Architectures for OMEGA: The Ocean Model for E3SM Global Applications
PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPU
Conference
·
Mon May 01 00:00:00 EDT 2023
·
OSTI ID:1994693
Performance Results on CPU/GPU Exascale Architectures for OMEGA: The Ocean Model for E3SM Global Applications
Technical Report
·
Wed Sep 25 00:00:00 EDT 2024
·
OSTI ID:2448297
PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPU
Conference
·
Wed Jul 31 20:00:00 EDT 2024
·
OSTI ID:3017042