skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Multi-node and Multi-core Performance Studies of a Monte Carlo code RMC

Journal Article · · Transactions of the American Nuclear Society
OSTI ID:22991920
; ; ;  [1]
  1. Department of Engineering Physics, Tsinghua University, Beijing, 100084, P.R. (China)

Over the past 15 years the architecture of high-performance computing systems has become increasingly heterogeneous with individual computing cores being subdivided into nodes and even auxiliary co-processing units such as graphics processing units or the Intel many integrated core co-processors. In fact, future computer platforms is moving toward a larger number of nodes and cores with lower memory available per node. Today, several production Monte Carlo codes use some types of hybrid parallelism strategy that makes use of shared-memory parallelism within a computing node and message-passing parallelism between nodes, which enables the codes to utilize available computing resources as much as possible, both among and within computing nodes on a cluster. The so-called hybrid parallelization is also possible with Reactor Monte Carlo Code RMC by activating both MPI and OpenMP parallel algorithms together. On HPC' which typically consists of several or several-ten thousand nodes equipping 16-32 computing intra-cores per node, simulation of RMC can be carried out in the hybrid parallelization with the MPI inter-node parallelization and the OpenMP intra-node parallelization. RMC has adopted a hybrid message-passing and shared-memory approach to parallelism, which enables the code to utilize available computing resources, both among and within computing nodes on a cluster. Shared-memory parallelism is especially valuable for large-model simulations because it enables multiple compute cores within a node to share common model information such as the geometry description, material, cross section and tally data, thus reducing the memory usage per processor. The calculation on the Tianhe-2 supercomputer took 6144 cores, resulting in a rate of 173,314,000 neutrons/minute for BEAVRS benchmark. While pin-wise full core H-M benchmark burnup calculations with millions of depletion regions was selected using MPI/OpenMP parallelism, the speed of 6,144 cores reach 1,843,200 burnable regions per minute. Keeping up with the continuing trend for increases in the number of computing resources within modern HPC systems is a significant challenge for teams that develop and maintain high-performance MC radiation transport solvers. order to address this challenge, RMC has adopted a hybrid message-passing and shared-memory approach to parallelism, which enables the code to utilize available computing resources, both among and within computing nodes on a cluster. Shared-memory parallelism is especially valuable for large-model simulations because it enables multiple compute cores within a node to share common model information such as the geometry description, material and composition definitions, and cross section data, thus reducing the memory usage per processor. The parallel performance has been studied by using two complex and representative benchmarks, H-M and BEAVRS benchmark. All simulations were run on Intel Xeon E5-2692 12 core, 2.2 GHz processors at MilkyWay-2 Super Computer with up to 12,288 available computing cores. While employing MPI parallelism, excellent parallel efficiency is achieved even above 12,000 processors and RMC reaches the parallel efficiency of 96% on 12,288 cores for the pin-by-pin computation of the BEAVRS benchmark. While BEAVRS criticality benchmark was selected using MPI/OpenMP parallelism, the speed of 6,144 cores reach 173,314,000 particles per minute. While pin-wise full core H-M benchmark burnup calculations with millions of depletion regions was selected using MPI/OpenMP parallelism, the speed of 6,144 cores reach 1,843,200 burnable regions per minute. Results from these studies demonstrate that RMC, using the hybrid parallel model described in this paper, is able to handle pin-wise full core burnup calculations with millions of depletion regions and scales well through thousands of processors. (authors)

OSTI ID:
22991920
Journal Information:
Transactions of the American Nuclear Society, Vol. 114, Issue 1; Conference: Annual Meeting of the American Nuclear Society, New Orleans, LA (United States), 12-16 Jun 2016; Other Information: Country of input: France; 5 refs.; Available from American Nuclear Society - ANS, 555 North Kensington Avenue, La Grange Park, IL 60526 United States; ISSN 0003-018X
Country of Publication:
United States
Language:
English