Dynamic Load Balancing Based on Constrained K-D Tree Decomposition for Parallel Particle Tracing
Abstract
Here, we propose a dynamically load-balanced algorithm for parallel particle tracing, which periodically attempts to evenly redistribute particles across processes based on k-d tree decomposition. Each process is assigned with (1) a statically partitioned, axis-aligned data block that partially overlaps with neighboring blocks in other processes and (2) a dynamically determined k-d tree leaf node that bounds the active particles for computation; the bounds of the k-d tree nodes are constrained by the geometries of data blocks. Given a certain degree of overlap between blocks, our method can balance the number of particles as much as possible. Compared with other load-balancing algorithms for parallel particle tracing, the proposed method does not require any preanalysis, does not use any heuristics based on flow features, does not make any assumptions about seed distribution, does not move any data blocks during the run, and does not need any master process for work redistribution. Based on a comprehensive performance study up to 8K processes on a Blue Gene/Q system, the proposed algorithm outperforms baseline approaches in both load balance and scalability on various flow visualization and analysis problems.
- Authors:
-
- Peking Univ., Beijing (China)
- Argonne National Lab. (ANL), Lemont, IL (United States)
- Publication Date:
- Research Org.:
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); National Natural Science Foundation of China (NSFC); National Key Basic Research Program of China
- OSTI Identifier:
- 1465513
- Grant/Contract Number:
- AC02-06CH11357
- Resource Type:
- Accepted Manuscript
- Journal Name:
- IEEE Transactions on Visualization and Computer Graphics
- Additional Journal Information:
- Journal Volume: 24; Journal Issue: 1; Journal ID: ISSN 1077-2626
- Publisher:
- IEEE
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; parallel particle tracing; dynamic load balancing; k-d trees; performance analysis
Citation Formats
Zhang, Jiang, Guo, Hanqi, Hong, Fan, Yuan, Xiaoru, and Peterka, Tom. Dynamic Load Balancing Based on Constrained K-D Tree Decomposition for Parallel Particle Tracing. United States: N. p., 2017.
Web. doi:10.1109/TVCG.2017.2744059.
Zhang, Jiang, Guo, Hanqi, Hong, Fan, Yuan, Xiaoru, & Peterka, Tom. Dynamic Load Balancing Based on Constrained K-D Tree Decomposition for Parallel Particle Tracing. United States. https://doi.org/10.1109/TVCG.2017.2744059
Zhang, Jiang, Guo, Hanqi, Hong, Fan, Yuan, Xiaoru, and Peterka, Tom. Mon .
"Dynamic Load Balancing Based on Constrained K-D Tree Decomposition for Parallel Particle Tracing". United States. https://doi.org/10.1109/TVCG.2017.2744059. https://www.osti.gov/servlets/purl/1465513.
@article{osti_1465513,
title = {Dynamic Load Balancing Based on Constrained K-D Tree Decomposition for Parallel Particle Tracing},
author = {Zhang, Jiang and Guo, Hanqi and Hong, Fan and Yuan, Xiaoru and Peterka, Tom},
abstractNote = {Here, we propose a dynamically load-balanced algorithm for parallel particle tracing, which periodically attempts to evenly redistribute particles across processes based on k-d tree decomposition. Each process is assigned with (1) a statically partitioned, axis-aligned data block that partially overlaps with neighboring blocks in other processes and (2) a dynamically determined k-d tree leaf node that bounds the active particles for computation; the bounds of the k-d tree nodes are constrained by the geometries of data blocks. Given a certain degree of overlap between blocks, our method can balance the number of particles as much as possible. Compared with other load-balancing algorithms for parallel particle tracing, the proposed method does not require any preanalysis, does not use any heuristics based on flow features, does not make any assumptions about seed distribution, does not move any data blocks during the run, and does not need any master process for work redistribution. Based on a comprehensive performance study up to 8K processes on a Blue Gene/Q system, the proposed algorithm outperforms baseline approaches in both load balance and scalability on various flow visualization and analysis problems.},
doi = {10.1109/TVCG.2017.2744059},
journal = {IEEE Transactions on Visualization and Computer Graphics},
number = 1,
volume = 24,
place = {United States},
year = {Mon Aug 28 00:00:00 EDT 2017},
month = {Mon Aug 28 00:00:00 EDT 2017}
}
Web of Science