Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Multi-core and Many-core Shared-memory Parallel Raycasting Volume Rendering Optimization and Tuning

Journal Article · · International Journal of High Performance Computing Applications
OSTI ID:1076796
Given the computing industry trend of increasing processing capacity by adding more cores to a chip, the focus of this work is tuning the performance of a staple visualization algorithm, raycasting volume rendering, for shared-memory parallelism on multi-core CPUs and many-core GPUs. Our approach is to vary tunable algorithmic settings, along with known algorithmic optimizations and two different memory layouts, and measure performance in terms of absolute runtime and L2 memory cache misses. Our results indicate there is a wide variation in runtime performance on all platforms, as much as 254% for the tunable parameters we test on multi-core CPUs and 265% on many-core GPUs, and the optimal configurations vary across platforms, often in a non-obvious way. For example, our results indicate the optimal configurations on the GPU occur at a crossover point between those that maintain good cache utilization and those that saturate computational throughput. This result is likely to be extremely difficult to predict with an empirical performance model for this particular algorithm because it has an unstructured memory access pattern that varies locally for individual rays and globally for the selected viewpoint. Our results also show that optimal parameters on modern architectures are markedly different from those in previous studies run on older architectures. And, given the dramatic performance variation across platforms for both optimal algorithm settings and performance results, there is a clear benefit for production visualization and analysis codes to adopt a strategy for performance optimization through auto-tuning. These benefits will likely become more pronounced in the future as the number of cores per chip and the cost of moving data through the memory hierarchy both increase.
Research Organization:
Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA (US)
Sponsoring Organization:
Computational Research Division
DOE Contract Number:
AC02-05CH11231
OSTI ID:
1076796
Report Number(s):
LBNL-5362E
Journal Information:
International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications
Country of Publication:
United States
Language:
English

Similar Records

Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning
Journal Article · Mon Apr 02 20:00:00 EDT 2012 · International Journal of High Performance Computing Applications · OSTI ID:1565091

Hybrid Parallelism for Volume Rendering on Large, Multi-core Systems
Journal Article · Mon Jul 12 00:00:00 EDT 2010 · Journal of Physics: Conference Series · OSTI ID:994006

Hybrid Parallelism for Volume Rendering on Large, Multi-core Systems
Conference · Mon Jun 14 00:00:00 EDT 2010 · OSTI ID:994007