skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning

Journal Article · · International Journal of High Performance Computing Applications
 [1];  [2]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division
  2. Brown Univ., Providence, RI (United States). Center for Computation and Visualization

Given the computing industry trend of increasing processing capacity by adding more cores to a chip, the focus of this work is tuning the performance of a staple visualization algorithm, raycasting volume rendering, for shared-memory parallelism on multi-core CPUs and many-core GPUs. Our approach is to vary tunable algorithmic settings, along with known algorithmic optimizations and two different memory layouts, and measure performance in terms of absolute runtime and L2 memory cache misses. Our results indicate there is a wide variation in runtime performance on all platforms, as much as 254% for the tunable parameters we test on multi-core CPUs and 265% on many-core GPUs, and the optimal configurations vary across platforms, often in a non-obvious way. For example, our results indicate the optimal configurations on the GPU occur at a crossover point between those that maintain good cache utilization and those that saturate computational throughput. This result is likely to be extremely difficult to predict with an empirical performance model for this particular algorithm because it has an unstructured memory access pattern that varies locally for individual rays and globally for the selected viewpoint. Our results also show that optimal parameters on modern architectures are markedly different from those in previous studies run on older architectures. In addition, given the dramatic performance variation across platforms for both optimal algorithm settings and performance results, there is a clear benefit for production visualization and analysis codes to adopt a strategy for performance optimization through auto-tuning. These benefits will likely become more pronounced in the future as the number of cores per chip and the cost of moving data through the memory hierarchy both increase.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Organization:
USDOE Office of Science (SC)
Grant/Contract Number:
AC02-05CH11231; AC05-00OR22725
OSTI ID:
1565091
Journal Information:
International Journal of High Performance Computing Applications, Vol. 26, Issue 4; ISSN 1094-3420
Publisher:
SAGECopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 12 works
Citation information provided by
Web of Science

References (25)

MapReduce: simplified data processing on large clusters journal January 2008
Auto-Tuning Memory-Intensive Kernels for Multicore book November 2010
An auto-tuning framework for parallel multicore stencil computations conference April 2010
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures conference November 2008
Volume rendering journal August 1988
Ray casting on shared-memory architectures: memory-hierarchy considerations in volume rendering journal January 1998
Distributed interactive ray tracing for large volume visualization conference January 2003
Parallel Ray Casting of Visible Human on Distributed Memory Architectures book January 1999
Towards a Multi-Level Cache Performance Model for 3D Stencil Computation journal January 2011
Streamline Integration Using MPI-Hybrid Parallelism on a Large Multicore Architecture journal November 2011
End-to-End Auto-Tuning with Active Harmony book November 2010
A single-pass GPU ray casting framework for interactive out-of-core rendering of massive volumetric datasets journal June 2008
V-buffer: visible volume rendering conference January 1988
A rendering algorithm for visualizing 3D scalar fields journal August 1988
Volume rendering on scalable shared-memory MIMD architectures conference January 1992
Global static indexing for real-time exploration of very large regular grids conference January 2001
A distributed memory algorithm for volume rendering conference January 1994
Multi-GPU MapReduce on GPU Clusters conference May 2011
Display of surfaces from volume data journal May 1988
Segmented ray casting for data parallel volume rendering conference January 1993
Acceleration techniques for GPU-based volume rendering conference January 2003
Parallel volume ray-casting for unstructured-grid data on distributed-memory architectures conference January 1995
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors journal February 2009
A simple and flexible volume rendering framework for graphics-hardware-based raycasting conference January 2005
High-speed volume ray casting with CUDA conference August 2008

Similar Records

Multi-core and Many-core Shared-memory Parallel Raycasting Volume Rendering Optimization and Tuning
Journal Article · Tue Jan 31 00:00:00 EST 2012 · International Journal of High Performance Computing Applications · OSTI ID:1565091

Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)
Technical Report · Fri Nov 29 00:00:00 EST 2019 · OSTI ID:1565091

Hybrid Parallelism for Volume Rendering on Large, Multi-core Systems
Conference · Mon Jun 14 00:00:00 EDT 2010 · OSTI ID:1565091