skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Multi-core and Many-core Shared-memory Parallel Raycasting Volume Rendering Optimization and Tuning

Journal Article · · International Journal of High Performance Computing Applications
OSTI ID:1076796

Given the computing industry trend of increasing processing capacity by adding more cores to a chip, the focus of this work is tuning the performance of a staple visualization algorithm, raycasting volume rendering, for shared-memory parallelism on multi-core CPUs and many-core GPUs. Our approach is to vary tunable algorithmic settings, along with known algorithmic optimizations and two different memory layouts, and measure performance in terms of absolute runtime and L2 memory cache misses. Our results indicate there is a wide variation in runtime performance on all platforms, as much as 254% for the tunable parameters we test on multi-core CPUs and 265% on many-core GPUs, and the optimal configurations vary across platforms, often in a non-obvious way. For example, our results indicate the optimal configurations on the GPU occur at a crossover point between those that maintain good cache utilization and those that saturate computational throughput. This result is likely to be extremely difficult to predict with an empirical performance model for this particular algorithm because it has an unstructured memory access pattern that varies locally for individual rays and globally for the selected viewpoint. Our results also show that optimal parameters on modern architectures are markedly different from those in previous studies run on older architectures. And, given the dramatic performance variation across platforms for both optimal algorithm settings and performance results, there is a clear benefit for production visualization and analysis codes to adopt a strategy for performance optimization through auto-tuning. These benefits will likely become more pronounced in the future as the number of cores per chip and the cost of moving data through the memory hierarchy both increase.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
Computational Research Division
DOE Contract Number:
DE-AC02-05CH11231
OSTI ID:
1076796
Report Number(s):
LBNL-5362E
Journal Information:
International Journal of High Performance Computing Applications, Related Information: Journal Publication Date: April 2012 (est)
Country of Publication:
United States
Language:
English

Similar Records

Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning
Journal Article · Tue Apr 03 00:00:00 EDT 2012 · International Journal of High Performance Computing Applications · OSTI ID:1076796

Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)
Technical Report · Fri Nov 29 00:00:00 EST 2019 · OSTI ID:1076796

Hybrid Parallelism for Volume Rendering on Large, Multi-core Systems
Conference · Mon Jun 14 00:00:00 EDT 2010 · OSTI ID:1076796