skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Investigation of Portable Event-Based Monte Carlo Transport Using the NVIDIA Thrust Library

Abstract

Power consumption considerations are driving future high performance computing platforms toward many-core computing architectures. Los Alamos National Laboratory's Trinity machine, available in 2016, will use both Intel Xeon Haswell processors and Intel Xeon Phi Knights Landing many integrated core (MIC) architecture coprocessors. Lawrence Livermore National Laboratory's Sierra machine, available in 2018, will use an IBM PowerPC architecture along with Nvidia graphics processing unit (GPU) architecture accelerators. These different advanced architectures make the computing landscape in upcoming years complex. Traditional approaches to Monte Carlo transport do not work efficiently on these new computing platforms. MIC architectures require vectorization to operate efficiently, and vectorization is difficult to achieve in Monte Carlo transport. GPU architectures require additional code to explicitly use the hardware, requiring significant code changes or hardware specific branches in the source code. A significant challenge for Monte Carlo transport projects is to simultaneously support within a single source code base efficient simulations for both the current generation of architectures and the different advanced computing architectures. In order to address these challenges, two important changes are typically required: a new algorithmic approach for solving Monte Carlo transport, and explicit use of hardware specific software. In this paper, we describe initial researchmore » investigations of an event-based Monte Carlo transport algorithm implemented using the Nvidia Thrust library on a GPU for a Monte Carlo test code. The event-based algorithm targets many-core architectures by increasing SIMD (single instruction multiple data) parallelism, while Thrust potentially provides portable performance by allowing one source code base to compile code targeted for both CPUs and GPUs. We described preliminary investigations of portable event-based Monte Carlo algorithms implemented using the Nvidia Thrust library in a research Monte Carlo test code. We found that an explicit CUDA implementation of an event-based Monte Carlo algorithm performed significantly more efficiently than a Thrust implementation on GPU platforms, most likely as a result of additional flexibility in access to different memory spaces on the GPU. Additionally, we showed that on GPU platforms and at large enough problem sizes the event-based implementations perform more efficiently than the serial history-based implementation running on the host CPU. While investigating this problem, we also discovered that the performance of the event-based algorithm is affected by what tallies are being used. A zonal scalar flux tally requires atomic operations that significantly impacted the performance of the code, in some cases producing slowdowns instead of speedups. We decided to remove the tally in order to focus on the effectiveness of the event-based algorithm. Future work will be required to research more effective ways of handling such tallies. Additionally, we would like to consider new ways for optimizing both the Thrust and CUDA versions, in order to see how much performance we have yet to achieve. The potential trade-off between portability and performance was demonstrated in this investigation. Thrust provides both CPU and GPU versions of the code in one code base, but it does so at a cost. We discovered approximately a factor of 2-5 performance difference between Thrust and CUDA on each GPU platform we tested. Thrust provides a tool to access the GPUs with less effort and specialization, but it does so by giving up fine grained control where extra performance can be found for this application. Future work will be required to determine whether this performance differential between Thrust and CUDA can be reduced for event-based Monte Carlo transport. (authors)« less

Authors:
 [1]; ; ;  [1];  [2]
  1. Lawrence Livermore National Laboratory, P.O. Box 808, Livermore, CA 94551 (United States)
  2. Department of Computer and Information Science, University of Oregon, Eugene, OR 97403 (United States)
Publication Date:
OSTI Identifier:
22992044
Resource Type:
Journal Article
Journal Name:
Transactions of the American Nuclear Society
Additional Journal Information:
Journal Volume: 114; Journal Issue: 1; Conference: Annual Meeting of the American Nuclear Society, New Orleans, LA (United States), 12-16 Jun 2016; Other Information: Country of input: France; 8 refs.; Available from American Nuclear Society - ANS, 555 North Kensington Avenue, La Grange Park, IL 60526 United States; Journal ID: ISSN 0003-018X
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICAL METHODS AND COMPUTING; 73 NUCLEAR PHYSICS AND RADIATION PHYSICS; ACCELERATORS; ALGORITHMS; COMPUTER ARCHITECTURE; LAWRENCE LIVERMORE NATIONAL LABORATORY; LOS ALAMOS; MONTE CARLO METHOD; OPTIMIZATION; PERFORMANCE; PROCESSING; SIMULATION; SLOWING-DOWN

Citation Formats

Bleile, Ryan C., Department of Computer and Information Science, University of Oregon, Eugene, OR 97403, Brantley, Patrick S., Dawson, Shawn A., O'Brien, Matthew J., and Childs, Hank. Investigation of Portable Event-Based Monte Carlo Transport Using the NVIDIA Thrust Library. United States: N. p., 2016. Web.
Bleile, Ryan C., Department of Computer and Information Science, University of Oregon, Eugene, OR 97403, Brantley, Patrick S., Dawson, Shawn A., O'Brien, Matthew J., & Childs, Hank. Investigation of Portable Event-Based Monte Carlo Transport Using the NVIDIA Thrust Library. United States.
Bleile, Ryan C., Department of Computer and Information Science, University of Oregon, Eugene, OR 97403, Brantley, Patrick S., Dawson, Shawn A., O'Brien, Matthew J., and Childs, Hank. 2016. "Investigation of Portable Event-Based Monte Carlo Transport Using the NVIDIA Thrust Library". United States.
@article{osti_22992044,
title = {Investigation of Portable Event-Based Monte Carlo Transport Using the NVIDIA Thrust Library},
author = {Bleile, Ryan C. and Department of Computer and Information Science, University of Oregon, Eugene, OR 97403 and Brantley, Patrick S. and Dawson, Shawn A. and O'Brien, Matthew J. and Childs, Hank},
abstractNote = {Power consumption considerations are driving future high performance computing platforms toward many-core computing architectures. Los Alamos National Laboratory's Trinity machine, available in 2016, will use both Intel Xeon Haswell processors and Intel Xeon Phi Knights Landing many integrated core (MIC) architecture coprocessors. Lawrence Livermore National Laboratory's Sierra machine, available in 2018, will use an IBM PowerPC architecture along with Nvidia graphics processing unit (GPU) architecture accelerators. These different advanced architectures make the computing landscape in upcoming years complex. Traditional approaches to Monte Carlo transport do not work efficiently on these new computing platforms. MIC architectures require vectorization to operate efficiently, and vectorization is difficult to achieve in Monte Carlo transport. GPU architectures require additional code to explicitly use the hardware, requiring significant code changes or hardware specific branches in the source code. A significant challenge for Monte Carlo transport projects is to simultaneously support within a single source code base efficient simulations for both the current generation of architectures and the different advanced computing architectures. In order to address these challenges, two important changes are typically required: a new algorithmic approach for solving Monte Carlo transport, and explicit use of hardware specific software. In this paper, we describe initial research investigations of an event-based Monte Carlo transport algorithm implemented using the Nvidia Thrust library on a GPU for a Monte Carlo test code. The event-based algorithm targets many-core architectures by increasing SIMD (single instruction multiple data) parallelism, while Thrust potentially provides portable performance by allowing one source code base to compile code targeted for both CPUs and GPUs. We described preliminary investigations of portable event-based Monte Carlo algorithms implemented using the Nvidia Thrust library in a research Monte Carlo test code. We found that an explicit CUDA implementation of an event-based Monte Carlo algorithm performed significantly more efficiently than a Thrust implementation on GPU platforms, most likely as a result of additional flexibility in access to different memory spaces on the GPU. Additionally, we showed that on GPU platforms and at large enough problem sizes the event-based implementations perform more efficiently than the serial history-based implementation running on the host CPU. While investigating this problem, we also discovered that the performance of the event-based algorithm is affected by what tallies are being used. A zonal scalar flux tally requires atomic operations that significantly impacted the performance of the code, in some cases producing slowdowns instead of speedups. We decided to remove the tally in order to focus on the effectiveness of the event-based algorithm. Future work will be required to research more effective ways of handling such tallies. Additionally, we would like to consider new ways for optimizing both the Thrust and CUDA versions, in order to see how much performance we have yet to achieve. The potential trade-off between portability and performance was demonstrated in this investigation. Thrust provides both CPU and GPU versions of the code in one code base, but it does so at a cost. We discovered approximately a factor of 2-5 performance difference between Thrust and CUDA on each GPU platform we tested. Thrust provides a tool to access the GPUs with less effort and specialization, but it does so by giving up fine grained control where extra performance can be found for this application. Future work will be required to determine whether this performance differential between Thrust and CUDA can be reduced for event-based Monte Carlo transport. (authors)},
doi = {},
url = {https://www.osti.gov/biblio/22992044}, journal = {Transactions of the American Nuclear Society},
issn = {0003-018X},
number = 1,
volume = 114,
place = {United States},
year = {2016},
month = {6}
}