Multi-node and Multi-core Performance Studies of a Monte Carlo code RMC

Yang, Feng; Yu, GangLin; Liang, JinGang; Wang, Kan

Title: Multi-node and Multi-core Performance Studies of a Monte Carlo code RMC

Journal Article · Wed Jun 15 00:00:00 EDT 2016 · Transactions of the American Nuclear Society

OSTI ID:22991920

Yang, Feng; Yu, GangLin; Liang, JinGang; Wang, Kan ^[1]

Department of Engineering Physics, Tsinghua University, Beijing, 100084, P.R. (China)

Over the past 15 years the architecture of high-performance computing systems has become increasingly heterogeneous with individual computing cores being subdivided into nodes and even auxiliary co-processing units such as graphics processing units or the Intel many integrated core co-processors. In fact, future computer platforms is moving toward a larger number of nodes and cores with lower memory available per node. Today, several production Monte Carlo codes use some types of hybrid parallelism strategy that makes use of shared-memory parallelism within a computing node and message-passing parallelism between nodes, which enables the codes to utilize available computing resources as much as possible, both among and within computing nodes on a cluster. The so-called hybrid parallelization is also possible with Reactor Monte Carlo Code RMC by activating both MPI and OpenMP parallel algorithms together. On HPC' which typically consists of several or several-ten thousand nodes equipping 16-32 computing intra-cores per node, simulation of RMC can be carried out in the hybrid parallelization with the MPI inter-node parallelization and the OpenMP intra-node parallelization. RMC has adopted a hybrid message-passing and shared-memory approach to parallelism, which enables the code to utilize available computing resources, both among and within computing nodes on a cluster. Shared-memory parallelism is especially valuable for large-model simulations because it enables multiple compute cores within a node to share common model information such as the geometry description, material, cross section and tally data, thus reducing the memory usage per processor. The calculation on the Tianhe-2 supercomputer took 6144 cores, resulting in a rate of 173,314,000 neutrons/minute for BEAVRS benchmark. While pin-wise full core H-M benchmark burnup calculations with millions of depletion regions was selected using MPI/OpenMP parallelism, the speed of 6,144 cores reach 1,843,200 burnable regions per minute. Keeping up with the continuing trend for increases in the number of computing resources within modern HPC systems is a significant challenge for teams that develop and maintain high-performance MC radiation transport solvers. order to address this challenge, RMC has adopted a hybrid message-passing and shared-memory approach to parallelism, which enables the code to utilize available computing resources, both among and within computing nodes on a cluster. Shared-memory parallelism is especially valuable for large-model simulations because it enables multiple compute cores within a node to share common model information such as the geometry description, material and composition definitions, and cross section data, thus reducing the memory usage per processor. The parallel performance has been studied by using two complex and representative benchmarks, H-M and BEAVRS benchmark. All simulations were run on Intel Xeon E5-2692 12 core, 2.2 GHz processors at MilkyWay-2 Super Computer with up to 12,288 available computing cores. While employing MPI parallelism, excellent parallel efficiency is achieved even above 12,000 processors and RMC reaches the parallel efficiency of 96% on 12,288 cores for the pin-by-pin computation of the BEAVRS benchmark. While BEAVRS criticality benchmark was selected using MPI/OpenMP parallelism, the speed of 6,144 cores reach 173,314,000 particles per minute. While pin-wise full core H-M benchmark burnup calculations with millions of depletion regions was selected using MPI/OpenMP parallelism, the speed of 6,144 cores reach 1,843,200 burnable regions per minute. Results from these studies demonstrate that RMC, using the hybrid parallel model described in this paper, is able to handle pin-wise full core burnup calculations with millions of depletion regions and scales well through thousands of processors. (authors)

Cite

Export

Save

OSTI ID:: 22991920

Journal Information:: Transactions of the American Nuclear Society, Vol. 114, Issue 1; Conference: Annual Meeting of the American Nuclear Society, New Orleans, LA (United States), 12-16 Jun 2016; Other Information: Country of input: France; 5 refs.; Available from American Nuclear Society - ANS, 555 North Kensington Avenue, La Grange Park, IL 60526 United States; ISSN 0003-018X

Country of Publication:: United States

Language:: English

Similar Records

Quantum Monte Carlo Endstation for Petascale Computing

Technical Report · Wed Mar 02 00:00:00 EST 2011 · OSTI ID:22991920

Ceperley, David

Development Status of the PEBBLES Code for Pebble Mechanics: Improved Physical Models and Speed-up

Technical Report · Tue Sep 01 00:00:00 EDT 2009 · OSTI ID:22991920

Cogliati, Joshua J; Ougouag, Abderrafi M

Development Status of the PEBBLES Code for Pebble Mechanics: Improved Physical Models and Speed-up

Technical Report · Tue Dec 01 00:00:00 EST 2009 · OSTI ID:22991920

Cogliati, Joshua J; Ougouag, Abderrafi M

Related Subjects

73 NUCLEAR PHYSICS AND RADIATION PHYSICS
97 MATHEMATICAL METHODS AND COMPUTING
BENCHMARKS
BURNUP
CROSS SECTIONS
EFFICIENCY
MONTE CARLO METHOD
NEUTRONS
SIMULATION
SUPERCOMPUTERS
VELOCITY

Title: Multi-node and Multi-core Performance Studies of a Monte Carlo code RMC

Citation Formats

Similar Records

Related Subjects