Streamline Integration Using MPI-Hybrid Parallelism on a Large Multicore Architecture
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Davis, CA (United States)
- Univ. of California, Davis, CA (United States)
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Streamline computation in a very large vector field data set represents a significant challenge due to the non-local and datadependentnature of streamline integration. In this paper, we conduct a study of the performance characteristics of hybrid parallel programmingand execution as applied to streamline integration on a large, multicore platform. With multi-core processors now prevalent in clustersand supercomputers, there is a need to understand the impact of these hybrid systems in order to make the best implementation choice.We use two MPI-based distribution approaches based on established parallelization paradigms, parallelize-over-seeds and parallelize-overblocks,and present a novel MPI-hybrid algorithm for each approach to compute streamlines. Our findings indicate that the work sharing betweencores in the proposed MPI-hybrid parallel implementation results in much improved performance and consumes less communication andI/O bandwidth than a traditional, non-hybrid distributed implementation.
- Research Organization:
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); National Science Foundation (NSF)
- DOE Contract Number:
- AC02-05CH11231
- OSTI ID:
- 1016364
- Report Number(s):
- LBNL-4563E; TRN: US201112%%320
- Journal Information:
- IEEE Transactions on Visualization and Computer Graphics, Vol. 17, Issue 11; ISSN 1077-2626
- Publisher:
- IEEE
- Country of Publication:
- United States
- Language:
- English
Similar Records
Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture
Thread-level parallelization and optimization of NWChem for the Intel MIC architecture