skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Algorithmic Improvements for Portable Event-Based Monte Carlo Transport Using the Nvidia Thrust Library

Journal Article · · Transactions of the American Nuclear Society
OSTI ID:23042651
 [1]; ;  [1];  [2]
  1. Lawrence Livermore National Laboratory, P.O. Box 808, Livermore, CA 94551 (United States)
  2. Department of Computer and Information Science, University of Oregon, Eugene, OR 97403 (United States)

High performance computing environments are progressively moving towards many-core computing architectures. The Los Alamos National Laboratory Trinity machine, available in late 2016, will use both Intel Xeon Haswell processors and Intel Xeon Phi Knights Landing many integrated core (MIC) coprocessors. The Lawrence Livermore National Laboratory Sierra machine, available in 2018, will use an IBM PowerPC architecture along with Nvidia graphics processing units (GPUs). Applications that must work in this supercomputing environment must continue to adapt in order to take advantage of the diverse hardware architectures that are coming. A significant consideration is not only the performance of the application on a given platform but also the portability of the application to other platforms. The algorithmic improvements presented in this paper build upon recently-reported work on event-based Monte Carlo transport in the ALPSMC code that models particle transport in one-dimensional binary stochastic media. That paper discussed the lack of available vectorization in the traditional history-based algorithm used for Monte Carlo transport and presented a data parallel event-based algorithm implemented using the Nvidia Thrust library for portability. The performance of the data parallel event-based algorithm implemented using Thrust was compared to a native CUDA implementation. The conclusions from that work were that the Thrust library abstraction technique caused too significant a loss in performance but that the event-based method was a viable option that should be further investigated. In this paper, we describe algorithmic improvements to the data parallel event-based algorithm previously presented. We made further algorithmic optimizations to the event-based CUDA implementation, most notably: data structure changes, a new conditional particle removal scheme in the event-based process, and the use of multiple GPUs. In addition to improvements to the algorithm, we re-implemented the Thrust version from the now further optimized CUDA version, giving a greater chance for success at a performant abstraction. Finally, we revisited our previous assumptions about the inability of the history-based method to achieve performance on vector style architectures such as the MICs and GPUs, with surprising and promising results. (authors)

OSTI ID:
23042651
Journal Information:
Transactions of the American Nuclear Society, Vol. 115; Conference: 2016 ANS Winter Meeting and Nuclear Technology Expo, Las Vegas, NV (United States), 6-10 Nov 2016; Other Information: Country of input: France; 7 refs.; available from American Nuclear Society - ANS, 555 North Kensington Avenue, La Grange Park, IL 60526 (US); ISSN 0003-018X
Country of Publication:
United States
Language:
English