Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning

Bethel, E. Wes; Howison, Mark

doi:10.1177/1094342012440466

Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning

Journal Article · Tue Apr 03 00:00:00 EDT 2012 · International Journal of High Performance Computing Applications

DOI:https://doi.org/10.1177/1094342012440466· OSTI ID:1565091

Bethel, E. Wes ^[1]; Howison, Mark ^[2]

Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division
Brown Univ., Providence, RI (United States). Center for Computation and Visualization

Given the computing industry trend of increasing processing capacity by adding more cores to a chip, the focus of this work is tuning the performance of a staple visualization algorithm, raycasting volume rendering, for shared-memory parallelism on multi-core CPUs and many-core GPUs. Our approach is to vary tunable algorithmic settings, along with known algorithmic optimizations and two different memory layouts, and measure performance in terms of absolute runtime and L2 memory cache misses. Our results indicate there is a wide variation in runtime performance on all platforms, as much as 254% for the tunable parameters we test on multi-core CPUs and 265% on many-core GPUs, and the optimal configurations vary across platforms, often in a non-obvious way. For example, our results indicate the optimal configurations on the GPU occur at a crossover point between those that maintain good cache utilization and those that saturate computational throughput. This result is likely to be extremely difficult to predict with an empirical performance model for this particular algorithm because it has an unstructured memory access pattern that varies locally for individual rays and globally for the selected viewpoint. Our results also show that optimal parameters on modern architectures are markedly different from those in previous studies run on older architectures. In addition, given the dramatic performance variation across platforms for both optimal algorithm settings and performance results, there is a clear benefit for production visualization and analysis codes to adopt a strategy for performance optimization through auto-tuning. These benefits will likely become more pronounced in the future as the number of cores per chip and the cost of moving data through the memory hierarchy both increase.

View Accepted Manuscript (DOE)

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)

Sponsoring Organization:: USDOE Office of Science (SC)

Grant/Contract Number:: AC02-05CH11231; AC05-00OR22725

OSTI ID:: 1565091

Journal Information:: International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications Journal Issue: 4 Vol. 26; ISSN 1094-3420

Publisher:: SAGECopyright Statement

Country of Publication:: United States

Language:: English

References (25)

Parallel Ray Casting of Visible Human on Distributed Memory Architectures Bajaj, Chandrajit; Ihm, Insung; Koo, Gee-bum Data Visualization ’99, 269–276 https://doi.org/10.1007/978-3-7091-6803-5_26	book	January 1999
A single-pass GPU ray casting framework for interactive out-of-core rendering of massive volumetric datasets Gobbetti, Enrico; Marton, Fabio; Iglesias Guitián, José Antonio The Visual Computer, Vol. 24, Issue 7-9 https://doi.org/10.1007/s00371-008-0261-9	journal	June 2008
Towards a Multi-Level Cache Performance Model for 3D Stencil Computation de la Cruz, Ràul; Araya-Polo, Mauricio Procedia Computer Science, Vol. 4 https://doi.org/10.1016/j.procs.2011.04.235	journal	January 2011
Display of surfaces from volume data Levoy, M. IEEE Computer Graphics and Applications, Vol. 8, Issue 3 https://doi.org/10.1109/38.511	journal	May 1988
Ray casting on shared-memory architectures: memory-hierarchy considerations in volume rendering Palmer, M. E.; Totty, B.; Taylor, S. IEEE Concurrency, Vol. 6, Issue 1 https://doi.org/10.1109/4434.656777	journal	January 1998
An auto-tuning framework for parallel multicore stencil computations Kamil, Shoaib; Chan, Cy; Oliker, Leonid 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS) https://doi.org/10.1109/IPDPS.2010.5470421	conference	April 2010
Multi-GPU MapReduce on GPU Clusters Stuart, Jeff A.; Owens, John D. Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2011.102	conference	May 2011
Distributed interactive ray tracing for large volume visualization DeMarle, D. E.; Parker, S.; Hartner, M. IEEE Symposium on Parallel and Large-Data Visualization and Graphics 2003, IEEE Sensors Journal https://doi.org/10.1109/PVGS.2003.1249046	conference	January 2003
High-speed volume ray casting with CUDA Marsalek, Lukas; Hauber, Armin; Slusallek, Philipp 2008 IEEE Symposium on Interactive Ray Tracing (RT) https://doi.org/10.1109/RT.2008.4634648	conference	August 2008
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures Datta, K.; Murphy, M.; Volkov, V. 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2008.5222004	conference	November 2008
A distributed memory algorithm for volume rendering Tiwari, R.; Huntsberger, T. L. Proceedings of IEEE Scalable High Performance Computing Conference https://doi.org/10.1109/SHPCC.1994.296650	conference	January 1994
Streamline Integration Using MPI-Hybrid Parallelism on a Large Multicore Architecture Camp, D.; Garth, C.; Childs, H. IEEE Transactions on Visualization and Computer Graphics, Vol. 17, Issue 11 https://doi.org/10.1109/TVCG.2010.259	journal	November 2011
A simple and flexible volume rendering framework for graphics-hardware-based raycasting Stegmaier, S.; Strengert, M.; Klein, T. Volume Graphics 2005, Fourth International Workshop on Volume Graphics, 2005. https://doi.org/10.1109/VG.2005.194114	conference	January 2005
Acceleration techniques for GPU-based volume rendering Kruger, J.; Westermann, R. IEEE Visualization 2003, IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control https://doi.org/10.1109/VISUAL.2003.1250384	conference	January 2003
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors Datta, Kaushik; Kamil, Shoaib; Williams, Samuel SIAM Review, Vol. 51, Issue 1 https://doi.org/10.1137/070693199	journal	February 2009
MapReduce: simplified data processing on large clusters Dean, Jeffrey; Ghemawat, Sanjay; Mehta, Brijesh Communications of the ACM, Vol. 51, Issue 1 https://doi.org/10.1145/1327452.1327492	journal	January 2008
Volume rendering on scalable shared-memory MIMD architectures Nieh, Jason; Levoy, Marc Proceedings of the 1992 workshop on Volume visualization - VVS '92 https://doi.org/10.1145/147130.147141	conference	January 1992
Segmented ray casting for data parallel volume rendering Hsu, William M. Proceedings of the 1993 symposium on Parallel rendering - PRS '93 https://doi.org/10.1145/166181.166182	conference	January 1993
Parallel volume ray-casting for unstructured-grid data on distributed-memory architectures Ma, Kwan-Liu Proceedings of the IEEE symposium on Parallel rendering - PRS '95 https://doi.org/10.1145/218327.218333	conference	January 1995
A rendering algorithm for visualizing 3D scalar fields Sabella, Paolo ACM SIGGRAPH Computer Graphics, Vol. 22, Issue 4 https://doi.org/10.1145/378456.378476	journal	August 1988
Volume rendering Drebin, Robert A.; Carpenter, Loren; Hanrahan, Pat ACM SIGGRAPH Computer Graphics, Vol. 22, Issue 4 https://doi.org/10.1145/378456.378484	journal	August 1988
V-buffer: visible volume rendering Upson, Craig; Keeler, Michael Proceedings of the 15th annual conference on Computer graphics and interactive techniques - SIGGRAPH '88 https://doi.org/10.1145/54852.378482	conference	January 1988
Global static indexing for real-time exploration of very large regular grids Pascucci, Valerio; Frank, Randall J. Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '01 https://doi.org/10.1145/582034.582036	conference	January 2001
End-to-End Auto-Tuning with Active Harmony Hollingsworth, Jeffrey; Tiwari, Ananta Chapman & Hall/CRC Computational Science https://doi.org/10.1201/b10509-11	book	November 2010
Auto-Tuning Memory-Intensive Kernels for Multicore Williams, Samuel; Datta, Kaushik; Oliker, Leonid Performance Tuning of Scientific Applications https://doi.org/10.1201/b10509-14	book	November 2010

Similar Records

Multi-core and Many-core Shared-memory Parallel Raycasting Volume Rendering Optimization and Tuning

Journal Article · Mon Jan 30 23:00:00 EST 2012 · International Journal of High Performance Computing Applications · OSTI ID:1076796

Hybrid Parallelism for Volume Rendering on Large, Multi-core Systems

Journal Article · Mon Jul 12 00:00:00 EDT 2010 · Journal of Physics: Conference Series · OSTI ID:994006

Hybrid Parallelism for Volume Rendering on Large, Multi-core Systems

Conference · Mon Jun 14 00:00:00 EDT 2010 · OSTI ID:994007

Related Subjects

97 MATHEMATICS AND COMPUTING
Computer Science

Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning

Citation Formats

References (25)

Similar Records

Related Subjects