Multi-core and Many-core Shared-memory Parallel Raycasting Volume Rendering Optimization and Tuning

Howison, Mark

Title: Multi-core and Many-core Shared-memory Parallel Raycasting Volume Rendering Optimization and Tuning

Journal Article · Tue Jan 31 00:00:00 EST 2012 · International Journal of High Performance Computing Applications

OSTI ID:1076796

Howison, Mark

Given the computing industry trend of increasing processing capacity by adding more cores to a chip, the focus of this work is tuning the performance of a staple visualization algorithm, raycasting volume rendering, for shared-memory parallelism on multi-core CPUs and many-core GPUs. Our approach is to vary tunable algorithmic settings, along with known algorithmic optimizations and two different memory layouts, and measure performance in terms of absolute runtime and L2 memory cache misses. Our results indicate there is a wide variation in runtime performance on all platforms, as much as 254% for the tunable parameters we test on multi-core CPUs and 265% on many-core GPUs, and the optimal configurations vary across platforms, often in a non-obvious way. For example, our results indicate the optimal configurations on the GPU occur at a crossover point between those that maintain good cache utilization and those that saturate computational throughput. This result is likely to be extremely difficult to predict with an empirical performance model for this particular algorithm because it has an unstructured memory access pattern that varies locally for individual rays and globally for the selected viewpoint. Our results also show that optimal parameters on modern architectures are markedly different from those in previous studies run on older architectures. And, given the dramatic performance variation across platforms for both optimal algorithm settings and performance results, there is a clear benefit for production visualization and analysis codes to adopt a strategy for performance optimization through auto-tuning. These benefits will likely become more pronounced in the future as the number of cores per chip and the cost of moving data through the memory hierarchy both increase.

View Journal Article

Cite

Export

Save

Research Organization:: Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

Sponsoring Organization:: Computational Research Division

DOE Contract Number:: DE-AC02-05CH11231

OSTI ID:: 1076796

Report Number(s):: LBNL-5362E

Journal Information:: International Journal of High Performance Computing Applications, Related Information: Journal Publication Date: April 2012 (est)

Country of Publication:: United States

Language:: English

Similar Records

Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning

Journal Article · Tue Apr 03 00:00:00 EDT 2012 · International Journal of High Performance Computing Applications · OSTI ID:1076796

Bethel, E. Wes; Howison, Mark

Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)

Technical Report · Fri Nov 29 00:00:00 EST 2019 · OSTI ID:1076796

Shen, Xipeng

Hybrid Parallelism for Volume Rendering on Large, Multi-core Systems

Conference · Mon Jun 14 00:00:00 EDT 2010 · OSTI ID:1076796

Howison, Mark; Bethel, E Wes; Childs, Hank

Related Subjects

97 MATHEMATICS AND COMPUTING
parallel volume rendering
performance optimization
auto-tuning
multi-core CPU
many-core GPU

Title: Multi-core and Many-core Shared-memory Parallel Raycasting Volume Rendering Optimization and Tuning

Citation Formats

Similar Records

Related Subjects